0. Introduction

This Jupyter notebook runs on a Python 2 kernel.

Cells that start with %%R are pure R. Cells that start with %%bash are pure bash code.

With %Rpush X we can push a variable, in this case X from python to R and with %Rpull X, pull a variable from R to python.

A minimal introduction to programming can be gain at codeacademy, more precisely, command line introduction learn-the-command-line and Python.

Nonetheless, readers should be able to understand variables, that everything typed after a # is a comment and not part of the code, and that everything between quotes is a string eg. "This is a string".

Users should not hesitate to add new cells and print out the contents of a variable eg.

eCLIP_1_bednarrowPeak="https://www.encodeproject.org/files/ENCFF066PCT/@@download/ENCFF066PCT.bed.gz"
print eCLIP_1_bednarrowPeak

or in the case of lists,

listA=[1,23,34,11,6,8,3,3,2,90,223,2,44,5,78]
print listA[:10]

to print the first 10 elements of the list, listA.

Or as in the case of Pandas DataFrames,

print dataframeX.head()

to print the first 5 lines / head of the DataFrame, dataframeX.

In [1]:
from datetime import datetime
print datetime.now()
2016-10-29 11:58:13.060242

1. Downloading raw data, reference genome, and reference annotation

Having identified KHSRP as a molecule of interest in ENCODE we start by downloading the files for the knock-down of KHSRP in the human cell line K562. The association graph in encode shows 2 replicates for which transcript and gene quantifications are already available. We collect these files for further analysis. It is important to make sure that all downloaded data which is already processed has the same reference genome and annotation. In this case, we use the GRCh38 v24 reference.

Equally, the controls shown in the summary section are downloaded from here.

Reference fasta and GTF files for different releases are available at gencodegenes.org.

The following step downloads all files required for differential gene expression analysis on our samples of choice:

In [2]:
%%bash 
# this makes this cell work in bash

echo "Downloading results"

# We start by creating the required folders
mkdir -p ~/work/results/rsem-results/raw_data
# Change directory into the newly created folder
cd ~/work/results/rsem-results/raw_data

# Download files with wget
wget -q https://www.encodeproject.org/files/ENCFF223LJT/@@download/ENCFF223LJT.tsv -O control.genes.results
wget -q https://www.encodeproject.org/files/ENCFF143CKD/@@download/ENCFF143CKD.tsv -O control.isoforms.results
wget -q https://www.encodeproject.org/files/ENCFF089HPB/@@download/ENCFF089HPB.tsv -O control_2.genes.results
wget -q https://www.encodeproject.org/files/ENCFF338VEF/@@download/ENCFF338VEF.tsv -O control_2.isoforms.results

wget -q https://www.encodeproject.org/files/ENCFF883SCU/@@download/ENCFF883SCU.tsv -O shRNA.genes.results
wget -q https://www.encodeproject.org/files/ENCFF959TXA/@@download/ENCFF959TXA.tsv -O shRNA.isoforms.results
wget -q https://www.encodeproject.org/files/ENCFF439UJR/@@download/ENCFF439UJR.tsv -O shRNA_2.genes.results
wget -q https://www.encodeproject.org/files/ENCFF607QXQ/@@download/ENCFF607QXQ.tsv -O shRNA_2.isoforms.results

# Change directory to one level bellow
cd ../
wget -q ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.primary_assembly.annotation.gtf.gz -O annotation.gtf.gz 
wget -q ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/GRCh38.primary_assembly.genome.fa.gz -O genome.fa.gz

# Unzip the downloaded reference files
unpigz annotation.gtf.gz
unpigz genome.fa.gz
Downloading results
In [3]:
print datetime.now()
2016-10-29 12:03:52.155047

2. Differential gene expression (DGE)

In ENCODE, by selecting the different analysis steps on the association graph we are shown which analysis pipeline has been used for the processing of the data - 'Long RNA-seq RSEM quantification step for paired-end pipeline'. RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. RSEM comes with the downstream tool EBseq which can be used for differential gene expression analysis.

In bash, we can check how a program works by doing eg.

rsem-run-ebseq --help

We follow the instructions and generate differential gene expression tables from the data downloaded from encode.

In [4]:
%%bash

cd ~/work/results/rsem-results/raw_data

echo "Analysing gene expression data"

rsem-generate-data-matrix shRNA.genes.results shRNA_2.genes.results control.genes.results control_2.genes.results > ../GeneMat.txt
cd ..
rsem-run-ebseq GeneMat.txt 2,2 GeneMat.results
rsem-control-fdr GeneMat.results 0.05 GeneMat.de.txt

echo "Analysing transcripts data"

cd raw_data 
rsem-generate-data-matrix shRNA.isoforms.results shRNA_2.isoforms.results control.isoforms.results control_2.isoforms.results > ../IsoformsMat.txt
cd ..
rsem-prepare-reference --gtf annotation.gtf genome.fa genome
rsem-generate-ngvector genome.transcripts.fa genome

# With the next comand we collect the 1st line of the file IsoformsMat.txt 
# and redirect the output of that to IsoformsMat_b.txt  
head -n 1 IsoformsMat.txt > IsoformsMat_b.txt  

# We then read the IsoformsMat.txt and grab everyle line that contains the
# text ENST for collecting ENSEMBL transcripts
cat IsoformsMat.txt | grep ENST >> IsoformsMat_b.txt

rsem-run-ebseq --ngvector genome.ngvec IsoformsMat_b.txt 2,2 IsoformsMat.results
rsem-control-fdr IsoformsMat.results 0.05 IsoformsMat.de.txt
Analysing gene expression data
rsem-for-ebseq-find-DE /home/jovyan/software/RSEM-1.3.0/EBSeq # GeneMat.txt GeneMat.results 2 2
Removing transcripts with 75 th quantile < = 10 
14379 transcripts will be tested

There are 616 genes/transcripts reported at FDR = 0.05.
Analysing transcripts data
rsem-extract-reference-transcripts genome 0 annotation.gtf None 0 genome.fa
Parsed 200000 lines
Parsed 400000 lines
Parsed 600000 lines
Parsed 800000 lines
Parsed 1000000 lines
Parsed 1200000 lines
Parsed 1400000 lines
Parsed 1600000 lines
Parsed 1800000 lines
Parsed 2000000 lines
Parsed 2200000 lines
Parsed 2400000 lines
Parsing gtf File is done!
genome.fa is processed!
199348 transcripts are extracted.
Extracting sequences is done!
Group File is generated!
Transcript Information File is generated!
Chromosome List File is generated!
Extracted Sequences File is generated!

rsem-preref genome.transcripts.fa 1 genome
Refs.makeRefs finished!
Refs.saveRefs finished!
genome.idx.fa is generated!
genome.n2g.idx.fa is generated!

rsem-for-ebseq-calculate-clustering-info 25 genome.transcripts.fa genome.ump
The reference is loaded.
All possbile 25 mers are generated.
All 25 mers are sorted.
Clustering information is calculated.

rsem-for-ebseq-generate-ngvector-from-clustering-info genome.ump genome.ngvec

rsem-for-ebseq-find-DE /home/jovyan/software/RSEM-1.3.0/EBSeq genome.ngvec IsoformsMat_b.txt IsoformsMat.results 2 2
Removing transcripts with 75 th quantile < = 10 
47970 transcripts will be tested

There are 2718 genes/transcripts reported at FDR = 0.05.
Loading required package: blockmodeling
Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

iteration 1 done 

time 4.78 

iteration 2 done 

time 2.53 

iteration 3 done 

time 2.52 

iteration 4 done 

time 2.45 

iteration 5 done 

time 2.34 

Loading required package: blockmodeling
Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

iteration 1 done 

time 68.14 

iteration 2 done 

time 27.21 

iteration 3 done 

time 25.25 

iteration 4 done 

time 30.53 

iteration 5 done 

time 21.07 

In [5]:
print datetime.now()
2016-10-29 12:38:50.542606

3. Parsing annotation file

As you will soon see, the annotation.gtf file contains valuable information. It is therefore practical to have it in an easy to use format like a DataFrame.

In [6]:
# print datetime.now()
# import pandas as pd
# import AGEpy.AGEpy as age

# # We define the variable outFolder which will contain the path for all output
# outFolder=os.path.expanduser("~")+"/work/results/"

# # We read the GTF file 
# GTF=age.readGTF(outFolder+"rsem-results/annotation.gtf")

# # Parse the read file
# parsedGTF=age.parseGTF(GTF)

# # And save the parsed version into disk.
# parsedGTF.to_csv(outFolder+"parsedGTF.tsv",sep="\t",index=None)

The step above is the most time comsuming part of this pipeline and we are therefore supplying the parsed GTF. The following step copies the parsedGTF.tsv file to your outFolder.

In [7]:
%%bash
cp ~/parsedGTF.tsv.gz ~/work/results/
cd ~/work/results/
tar -zxvf parsedGTF.tsv.gz
parsedGTF.tsv

4. Import required packages

In python we can import packages and attribute them a new (short) name.

import pandas as pd

We then use functions from the specfic package with pd.FunctionName(arguments). If we want to get help for the respective function we use the help function eg. help(pd.FunctionName).

This can be quite practical when invoking functions from a package which might exist with the same name in another package.

In [8]:
%reset -f
from datetime import datetime
print datetime.now()
import pandas as pd
import numpy as np
import AGEpy.AGEpy as age
import os
import sys
from urllib import urlopen
import urllib2
import scipy
import statsmodels
from statsmodels.sandbox.stats import multicomp
import seaborn as sns
import rpy2
import matplotlib
import matplotlib.pyplot as plt
import StringIO
import gzip
import pybedtools
from pybedtools import BedTool
from wand.image import Image as WImage
import cPickle as pickle
# this allows us to see graphics direclty on our browser
% matplotlib inline 
# this allows us to use R in extension for Jupyter
%load_ext rpy2.ipython 
2016-10-29 12:39:06.665796
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Note: the specification for S3 class “AsIs” in package ‘DBI’ seems equivalent to one from package ‘BiocGenerics’: not turning on duplicate class definitions for this class.

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
In [9]:
print datetime.now()
outFolder=os.path.expanduser("~")+"/work/results/"
outFigures=outFolder+"figures/"
if not os.path.exists(outFigures):
    os.makedirs(outFigures)
2016-10-29 12:39:15.063696

For using the DAVID API you will need to first register your email address and then use it as your DAVIDuser

In [10]:
DAVIDuser="Jorge.Boucas@age.mpg.de"

ENCODE provides BED narrow peak files for peaks identified by eCPLIP of KHSRP. We define the links to those bed files here:

In [11]:
print datetime.now()
# KHSRP
eCLIP_1_bednarrowPeak="https://www.encodeproject.org/files/ENCFF066PCT/@@download/ENCFF066PCT.bed.gz"
eCLIP_2_bednarrowPeak="https://www.encodeproject.org/files/ENCFF512ZBC/@@download/ENCFF512ZBC.bed.gz"
2016-10-29 12:39:15.106146

In Python functions are defined as

def SomeFunction(input1,input2,..):
    """
    documentation
    """
    operations on input1, input2
    return result

Using functions can be extremely practical once the code needs to be applied to more than one input group. We will mostly define functions for our operations as we will tend to re-use them eg. genes data and isoforms data.

5. Exploring DGE results

5.1 Read data

In [12]:
print datetime.now()
def GetData(all_file,sig_file,indexLabel):
    """
    Reads rsem-run-ebseq and rsem-control-fdr output.
    
    :param all_file: /path/to/rsem-run-ebseq/output.file
    :param all_file: /path/to/rsem-control-fdr/output.file
    :param indexLabel: 'transcript_id' or 'gene_id'
    
    :returns: a Pandas dataframe
    """
    
    # we use pandas to read the tab separated file
    df=pd.read_table(all_file)
    
    # we define the headers for the dataframe in a list 
    cols_=["PPEE","PPDE","shRNA/control","RealFC","shRNA","control"]
    
    # we attribute the define list as the header of the dataframe
    df.columns=cols_
    
    # we set the index into a column with the header
    # of our choice ie. indexLabel
    df[indexLabel]=df.index.tolist()
    
    df.reset_index(inplace=True,drop=True)
    
    # we make sure the columns have the order we wish, ie.
    # first the indexLabel variable and then the remaining headers
    cols=[indexLabel]
    for c in cols_:
        cols.append(c)
    df=df[cols]

    
    df_=pd.read_table(sig_file)
    
    # after reading the file containing the significant changes
    # we collect the ids of those genes from the index
    sigGenes=df_.index.tolist()

    # we define a function that retrieves "yes" if a give id (x) is
    # in the list of significant genes (sigGenes)
    def CheckSig(x,sigGenes=sigGenes):
        if x in sigGenes:
            return "yes"
        else:
            return "no"
    
    # with the DataFrame[column_for_x].apply(lambda x: function(x))
    # we parallelize the operation over all the rows
    df["sig"]=df[indexLabel].apply(lambda x: CheckSig(x))
    
    df["log2(shRNA/control)"]=df["shRNA/control"].apply(lambda x: np.log2(x))
    return df

dfGenes=GetData(outFolder+"rsem-results/GeneMat.results",\
                outFolder+"rsem-results/GeneMat.de.txt",\
                "gene_id")
dfTranscripts=GetData(outFolder+"rsem-results/IsoformsMat.results",\
               outFolder+"rsem-results/IsoformsMat.de.txt",\
               "transcript_id")
2016-10-29 12:39:15.137961

5.2 Selecting significantly changed genes and transcripts

In [13]:
print datetime.now()
samples=["control","shRNA"]
list_of_comparisons=["control","shRNA"]

# we define the list of significant genes by collecting them from 
# the dfGenes dataframe after subsetting it to ["sig"]=="yes"
# in Pandas this can be done like this:
# dataframe[dataframe["column of interest"]=="value of interest"]
sigGenes=dfGenes[dfGenes["sig"]=="yes"]["gene_id"].tolist()
sigTranscripts=dfTranscripts[dfTranscripts["sig"]=="yes"]["transcript_id"].tolist()
2016-10-29 12:39:17.677502

5.3 Distribution of expression values

In [14]:
print datetime.now()
def plotKDE(df,title,figName,targets=None,samples=samples):
    """
    Plots KDEs on GetData() outputs.
    
    :param df: dataframe output of GetData()
    :param title: plot title
    :param figName: /path/to/saved/figure/prefix
    :param targets: list of ids to filter and subplot
    :param samples: list of samples in df that should be plotted
    
    :returns: nothing 
    """
    
    sns.set_style("white")
    for s in samples:
        sns.kdeplot( df[s].apply(lambda x: np.log10(x)) )
    
    if targets:
        if title=="Genes":
            df_=df[df["gene_id"].isin(targets)]
        elif title == "Transcripts":
            df_=df[df["transcript_id"].isin(targets)]

        for s in samples:
            sns.kdeplot( df_[s].apply(lambda x: np.log10(x)),label=s+" (RBP targets)" )

    plt.gca().spines['right'].set_visible(False)
    plt.gca().spines['top'].set_visible(False)
    plt.xlabel("log10(mean counts)")
    plt.ylabel("frequency")
    plt.title(title)
    plt.legend(loc=1, borderaxespad=0,bbox_to_anchor=(1.3, 1))
    
    plt.savefig(figName+".png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
    plt.savefig(figName+".svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')

    plt.show()
        
plotKDE(dfGenes,'Genes',outFigures+"Figure1")
plotKDE(dfTranscripts,'Transcripts',outFigures+"Figure2")
2016-10-29 12:39:17.714670
In [15]:
print datetime.now()
def plotScater(df,title,figName,c=samples):
    """
    Plots scatter plots on GetData() outputs.
    
    :param df: dataframe output of GetData()
    :param title: plot title
    :param figName: /path/to/saved/figure/prefix
    :param samples: pair of samples to be plotted in list format
    
    :returns: nothing 
    """
    
    df_=df[df[c[0]]>0]
    df_=df_[df_[c[1]]>0]
    Xdata=df_[c[0]].apply(lambda x: np.log10(x))
    Ydata=df_[c[1]].apply(lambda x: np.log10(x))

    fig = plt.gcf()
    fig.set_size_inches(6, 6)

    plt.scatter(Xdata,Ydata,s=4)
    plt.xlabel("log10(%s)" %str(c[0]) )
    plt.ylabel("log10(%s)" %str(c[1]))
    plt.title(title)
    plt.savefig(figName+".png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
    plt.savefig(figName+".svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')

    plt.show()
        
plotScater(dfGenes,'Genes',outFigures+"Figure3")
plotScater(dfTranscripts,'Transcripts',outFigures+"Figure4")
2016-10-29 12:39:19.180197

Significantly changed genes tend to accumulate towards the higher levels of expression as they are also more easy to detect and quantify.

To identifiy genes strongly changed in relation to others with the same expression level we plot the log2(fold cahnge) of each gene in function of it's normalised intensities ie. log10(sqrt(expression in condition 1 expression in condition 2))*.

We divide the genes through bins depending on their normalised intensities, identify the corresponding 0.5 log2(fold change) percentile for each bin, and fit a polynomial curve.

To identify genes of interest we mark genes out of the 0.5 percentile which are also significantly changed with red.

In [16]:
print datetime.now()

dfGenesWout,redGenesOut=age.MA(dfGenes,'Genes',outFigures+"Figure5",list_of_comparisons,spec=sigGenes,splines=False)
print "red, significantly changed genes"
sys.stdout.flush()

dfTranscriptsWout,redTranscriptsOut=age.MA(dfTranscripts,'Transcripts',outFigures+"Figure6",list_of_comparisons,spec=sigTranscripts,splines=False)  
print "red, significantly changed transcripts"
sys.stdout.flush()

dfGenesWoutA,redGenesOutA=age.MA(dfGenes,'Genes',outFigures+"Figure7",list_of_comparisons,Targets=sigGenes)
print "(A) red, significantly changed genes out of the 0.5 percentil"
sys.stdout.flush()

dfTranscritpsWoutA,redTranscriptsOutA=age.MA(dfTranscripts,'Transcripts',outFigures+"Figure8",list_of_comparisons,Targets=sigTranscripts)
print "(A) red,  significantly changed transcripts out of the 0.5 percentil"
sys.stdout.flush()
2016-10-29 12:39:29.946190
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1724: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df_b["bin"]=pd.cut(df_b["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist(), nbins,labels=False)
red, significantly changed genes
red, significantly changed transcripts
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1786: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  red=df_[df_["OutBounds"]==1][df_["gene_id"].isin(Targets)]["gene_id"].tolist()
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1787: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  Xdata_=df_[df_["OutBounds"]==1][df_["gene_id"].isin(Targets)]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) )].tolist()
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1788: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  Ydata_=df_[df_["OutBounds"]==1][df_["gene_id"].isin(Targets)]["log2(%s/%s)" %( str(c[1]), str(c[0]) )].tolist()
(A) red, significantly changed genes out of the 0.5 percentil
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1782: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  red=df_[df_["OutBounds"]==1][df_["transcript_id"].isin(Targets)]["transcript_id"].tolist()
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1783: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  Xdata_=df_[df_["OutBounds"]==1][df_["transcript_id"].isin(Targets)]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist()
/home/jovyan/AGEpy/AGEpy/AGEpy.py:1784: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  Ydata_=df_[df_["OutBounds"]==1][df_["transcript_id"].isin(Targets)]["log2(%s/%s)" %( str(c[1]), str(c[0]) ) ].tolist()
(A) red,  significantly changed transcripts out of the 0.5 percentil

We now analyse the peaks from the eCLIP data. We start by streaming the data from ENCODE into a dataframe:

6. Read eCLIP peaks

In [17]:
print datetime.now()

# We start by streaming the data from ENCODE into a dataframe
bed_A=age.GetBEDnarrowPeakgz(eCLIP_1_bednarrowPeak)
bed_B=age.GetBEDnarrowPeakgz(eCLIP_2_bednarrowPeak)

# We transform the dataframe into an object of the class bedtool, 
# suitable for usage with the functions in the pybedtools package
bedtool_A=age.dfTObedtool(bed_A)
bedtool_B=age.dfTObedtool(bed_B)
2016-10-29 12:40:00.734507
In [18]:
print datetime.now()

# We intersect both bed files and get all original entries in the ouput
bedtool_AB = bedtool_A.intersect(bedtool_B,wo=True,s=True)

# We transform the intersect into a Pandas dataframe
dfPeaks=pd.read_table(bedtool_AB.fn, names=["chrom_A","chromStart_A","chromEnd_A","name_A","score_A","strand_A",\
                                       "signal_Value_A","-log10(pValue)_A","-log10(qValue)_A","peak_A",\
                                       "chrom_B","chromStart_B","chromEnd_B","name_B","score_B","strand_B",\
                                       "signal_Value_B","-log10(pValue)_B","-log10(qValue)_B","peak_B",\
                                       "overlap"])
2016-10-29 12:40:11.130587
In [19]:
print datetime.now()

# We intersect both bed files returning only the common regions
bedtool_AB_ = bedtool_A.intersect(bedtool_B,s=True)
dfPeaks_=pd.read_table(bedtool_AB_.fn, names=["chrom","chromStart","chromEnd","name","score","strand",\
                                       "signal_Value","-log10(pValue)","-log10(qValue)","peak"])

# We concatenate both complete and common-only intersects side-by-side
dfPeaks=pd.concat([dfPeaks_,dfPeaks],axis=1)
dfPeaks.reset_index(inplace=True, drop=True)

# We rename each row on the name section of the bed 
dfPeaks["name"]=dfPeaks_.index.tolist()
dfPeaks["name"]="Peak_"+dfPeaks["name"].astype(str)
2016-10-29 12:40:11.255338
In [20]:
print datetime.now()

# We filter to peaks where the p-value was bellow 0.05 in at 
# least one replicate
filteredPeaks=dfPeaks[( dfPeaks["-log10(pValue)_A"].astype(float)>(np.log10(0.05)*-1.00) ) | \
                 ( dfPeaks["-log10(pValue)_B"].astype(float)>(np.log10(0.05)*-1.00) )]

# For each of the report values we calculcate the mean between the 2 beds
for i in ["-log10(pValue)","signal_Value","-log10(qValue)","score","peak"]:
    filteredPeaks[i]=filteredPeaks[["%s_A" %i,"%s_B" %i]] .mean(axis=1)

dfPeaks=filteredPeaks[["chrom","chromStart","chromEnd","name","score","strand",\
                        "signal_Value","-log10(pValue)","-log10(qValue)","peak"]]
dfPeaks=dfPeaks.drop_duplicates()
dfPeaks.reset_index(inplace=True, drop=True)

dfPeaksA=filteredPeaks[["chrom_A","chromStart_A","chromEnd_A","name_A","score_A","strand_A",\
                        "signal_Value_A","-log10(pValue)_A","-log10(qValue)_A","peak_A"]]
dfPeaksA.columns=["chrom","chromStart","chromEnd","name","score","strand",\
                  "signal_Value","-log10(pValue)","-log10(qValue)","peak"]

dfPeaksA=dfPeaksA.drop_duplicates()
dfPeaksA.reset_index(inplace=True, drop=True)

dfPeaksB=filteredPeaks[["chrom_B","chromStart_B","chromEnd_B","name_B","score_B","strand_B",\
                        "signal_Value_B","-log10(pValue)_B","-log10(qValue)_B","peak_B"]]
dfPeaksB.columns=["chrom","chromStart","chromEnd","name","score","strand",\
                  "signal_Value","-log10(pValue)","-log10(qValue)","peak"]
dfPeaksB=dfPeaksB.drop_duplicates()
dfPeaksB.reset_index(inplace=True, drop=True)

bedtool_AB = age.dfTObedtool(dfPeaks)
bedtool_A = age.dfTObedtool(dfPeaksA)
bedtool_B = age.dfTObedtool(dfPeaksB)
2016-10-29 12:40:11.383902
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [21]:
print datetime.now()
GTF=age.readGTF(outFolder+"rsem-results/annotation.gtf")
parsedGTF=pd.read_table(outFolder+"parsedGTF.tsv")
2016-10-29 12:40:11.857270
In [22]:
print datetime.now()

# We get the intersect of each respective bed with the 
# matching exons, transcripts, and genes and normalize the
# values (eg. signal_Value ) to each of these features
dfTargets=age.GetPeaksExons(bedtool_AB,parsedGTF)
dfTargetsA=age.GetPeaksExons(bedtool_A,parsedGTF)
dfTargetsB=age.GetPeaksExons(bedtool_B,parsedGTF)
2016-10-29 12:40:41.395129
***** WARNING: File /tmp/pybedtools.pZYHk2.tmp has inconsistent naming convention for record:
GL000008.2	118091	118173	ENSE00003729529.1	.	+

***** WARNING: File /tmp/pybedtools.pZYHk2.tmp has inconsistent naming convention for record:
GL000008.2	118091	118173	ENSE00003729529.1	.	+

/home/jovyan/AGEpy/AGEpy/AGEpy.py:1923: RuntimeWarning: overflow encountered in double_scalars
  tmp=tmp.groupby(field).apply(lambda l: reduce(lambda x, y: x*y, l["-log10(pValue)"]) )
***** WARNING: File /tmp/pybedtools.nWydFB.tmp has inconsistent naming convention for record:
GL000008.2	118091	118173	ENSE00003729529.1	.	+

***** WARNING: File /tmp/pybedtools.nWydFB.tmp has inconsistent naming convention for record:
GL000008.2	118091	118173	ENSE00003729529.1	.	+

***** WARNING: File /tmp/pybedtools.NyQ9u_.tmp has inconsistent naming convention for record:
GL000008.2	118091	118173	ENSE00003729529.1	.	+

***** WARNING: File /tmp/pybedtools.NyQ9u_.tmp has inconsistent naming convention for record:
GL000008.2	118091	118173	ENSE00003729529.1	.	+

In [23]:
print datetime.now()
TargetE=dfTargets["exon_id"].tolist()
TargetE=[ ss for ss in TargetE if str(ss) != "nan" ]

TargetT=parsedGTF[parsedGTF["exon_id"].isin(TargetE)]["transcript_id"].tolist()
TargetT=[ ss for ss in TargetT if str(ss) != "nan" ]

TargetG=parsedGTF[parsedGTF["exon_id"].isin(TargetE)]["gene_id"].tolist()
TargetG=[ ss for ss in TargetG if str(ss) != "nan" ]
                 
print "type\t", "exons\ttranscritps\tgenes\t"
print "total\t", "%i\t%i\t%i\t" %(len(TargetE), len(TargetT), len(TargetG))
print "unique\t", "%i\t%i\t%i\t" %(len(set(TargetE)) , len(set(TargetT)), len(set(TargetG))) 
2016-10-29 12:41:02.714482
type	exons	transcritps	genes	
total	8276	8941	8941	
unique	3162	3436	1465	

7. Merging RNAseq and eCLIP

In [24]:
print datetime.now()

# We define shRNA related genes - shGenes -  as the genes which are out
# of the bounds of the percentil of choice on the MA plot 
shGenes=dfGenesWout[dfGenesWout["OutBounds"]==1]["gene_id"].tolist()
shTranscripts=dfTranscriptsWout[dfTranscriptsWout["OutBounds"]==1]["transcript_id"].tolist()
2016-10-29 12:41:02.957807
In [25]:
print datetime.now()
dfGenesWoutB,redGenesOutB=age.MA(dfGenes,'Genes',outFigures+"Figure9",list_of_comparisons,spec=TargetG,splines=False)
print "(B) red, RBP target genes"
sys.stdout.flush()

dfTranscritpsWoutB,redTranscriptsOutB=age.MA(dfTranscripts,'Transcripts',outFigures+"Figure10",list_of_comparisons,spec=TargetT, splines=False)
print "(B) red, RBP target transcripts"
sys.stdout.flush()

# We define the list of RBP Target Genes that are differentially
# expressed (TargetG_dif) if a gene (s) for each gene (s) in the 
# list of RBP targets (TargetG) are also in the list of 
# significantly changed genes (sigGenes)
TargetG_dif=[ s for s in TargetG if s in sigGenes ]
TargetT_dif=[ s for s in TargetT if s in sigTranscripts ]

dfGenesWoutC,redGenesOutC=age.MA(dfGenes,'Genes',outFigures+"Figure11",list_of_comparisons,spec=TargetG_dif,splines=False)
print "(C) red, significantly changed RBP target genes"
sys.stdout.flush()

dfTranscritpsWoutC,redTranscriptsOutC=age.MA(dfTranscripts,'Transcripts',outFigures+"Figure12",list_of_comparisons,spec=TargetT_dif,splines=False)
print "(C) red, significantly changed RBP target transcripts"
sys.stdout.flush()

dfGenesWoutD,redGenesOutD=age.MA(dfGenes,'Genes',outFigures+"Figure13",list_of_comparisons,Targets=TargetG_dif)
print "(D) red, significantly changed RBP target genes out of the 0.5 percentil"
sys.stdout.flush()

dfTranscritpsWoutD,redTranscriptsOutD=age.MA(dfTranscripts,'Transcripts',outFigures+"Figure14",list_of_comparisons,Targets=TargetT_dif)
print "(D) red, significantly changed RBP target transcript out of the 0.5 percentil"
sys.stdout.flush()
2016-10-29 12:41:02.986255
(B) red, RBP target genes
(B) red, RBP target transcripts
(C) red, significantly changed RBP target genes
(C) red, significantly changed RBP target transcripts
(D) red, significantly changed RBP target genes out of the 0.5 percentil
(D) red, significantly changed RBP target transcript out of the 0.5 percentil
In [26]:
print datetime.now()
        
plotKDE(dfGenes,'Genes',outFigures+"Figure15",dfTargets["gene_id"].tolist())
plotKDE(dfTranscripts,'Transcripts',outFigures+"Figure16",dfTargets["transcript_id"].tolist())
2016-10-29 12:41:49.554730
In [27]:
print datetime.now()
def plotKDE_nbind(df,title,targetCol,figName):
    """
    Plots KDEs on any field of choice from a Pandas dataframe.
    
    :param df: Pandas dataframe 
    :param title: plot title
    :param targetCol: header of the column to use
    :param figName: /path/to/saved/figure/prefix
    
    :returns: nothing 
    """

    sns.set_style("white")
    
    df_=df[["transcript_id",targetCol]].drop_duplicates()
    sns.kdeplot( df[targetCol].apply(lambda x: np.log10(x)) )

    plt.gca().spines['right'].set_visible(False)
    plt.gca().spines['top'].set_visible(False)
    plt.xlabel("log10(%s)" %targetCol)
    plt.ylabel("frequency")
    plt.title(title)
    plt.gca().legend().set_visible(False)
    plt.savefig(figName+".png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
    plt.savefig(figName+".svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')
    plt.show()
        
plotKDE_nbind(dfTargets,'Transcripts','transcript_id_count',outFigures+"Figure17")
2016-10-29 12:41:51.201742
In [28]:
print datetime.now()
def MA_(df,title,figName,c=list_of_comparisons, daType="counts",nbins=10,perc=.1,deg=3,eq=True,splines=True,spec=None,Targets=None,ylim=None,sizeRed=8, dfTargetsAn=dfTargets, targetCol='transcript_id_count',spMAX=None):    
    """
    Plots an MA like plot using a column of choice for normalized.
    
    :param df: dataframe output of GetData()
    :param title: plot title, 'Genes' or 'Transcripts'
    :param figName: /path/to/saved/figure/prefix
    :param c: pair of samples to be plotted in list format
    :param daType: data type, ie. 'counts' or 'FPKM'
    :param nbins: number of bins on normalized intensities to fit the splines
    :param per: log2(fold change) percentil to which the splines will be fitted
    :param deg: degress of freedom used to fit the splines
    :param eq: if true assumes for each bin that the lower and upper values are equally distant to 0, taking the smaller distance for both
    :param spec: list of ids to be highlighted 
    :param Targets: list of ids that will be highlighted if outside of the fitted splines
    :param ylim: a list of limits to apply on the y-axis of the plot
    :param sizeRed: size of the highlight marker
    :param dfTargetsAn: a Pandas dataframe with the 'targetCol' and the respective ids of values to plot in the x axis
    :param targetCol: target column to use in the x axis eg. 'transcript_id_count'
    :param spMAX: maximum values of x to use for the splines

    :returns df_: a Pandas dataframe similar to the GetData() output with normalized intensities and spline outbounds rows marked as 1.
    :returns red: list of ids that are highlighted
    """

    df_=df[df[c[0]]>0]
    df_=df_[df_[c[1]]>0]
    
    if title == "Transcripts":
        dfTargets_=dfTargetsAn[["transcript_id",targetCol]].drop_duplicates()
        df_=pd.merge(df_,dfTargets_,on=["transcript_id"],how="left")
    elif title == "Genes":
        dfTargets_=dfTargetsAn[["gene_id",targetCol]].drop_duplicates()
        df_=pd.merge(df_,dfTargets_,on=["gene_id"],how="left")
    
    df_=df_.fillna(0)
    
    df_["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ]=df_[targetCol]#.apply(lambda x: np.log10(x)) 
    
    if daType=="counts":
        lowLim=-1
    elif daType=="FPKM":
        lowLim=np.log10(0.1)
            
    df_b=df_[df_["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ]>lowLim ]
    df_b.reset_index(inplace=True, drop=True)

    Xdata=df_["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist()
    Ydata=df_["log2(%s/%s)" %( str(c[1]), str(c[0]) )].tolist()

    minX=min(Xdata)
    maxX=max(Xdata)

    minX_=min(df_b["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist())
    maxX_=max(df_b["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist())

    df_b["bin"]=df_b["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ]
    
    spl=[]
    for b in set( df_b["bin"].tolist() ):
        tmp=df_b[df_b["bin"]==b]
        Xbin = tmp["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist()
        Xval = np.mean([max(Xbin),min(Xbin)])
        Ybin = tmp["log2(%s/%s)" %( str(c[1]), str(c[0]) )].tolist()
        YvalP=np.percentile(Ybin,100.00-float(perc)) 
        YvalM=np.percentile(Ybin,float(perc))
        spl.append([Xval,YvalP,YvalM])

    spl=pd.DataFrame( spl,columns=["X","Upper","Lower"],index=range(len(spl)) )
    
    def CheckMin(df):
        U=abs(df["Upper"])
        L=abs(df["Lower"])
        return min([U,L])

    spl["min"]=spl.apply(CheckMin, axis=1)
    spl=spl[spl["min"]!=0]
    if spMAX:
        spl=spl[spl["X"]<spMAX]
    else:
        spl=spl[spl["X"]<35]

    coeffsUpper = np.polyfit(spl["X"].tolist(), spl["Upper"].tolist(), deg)
    coeffsLower = np.polyfit(spl["X"].tolist(), spl["Lower"].tolist(), deg) 

    Xspl = np.array(np.linspace(minX, maxX, 10*nbins)) 

    if eq:
        coeffsUpper = np.polyfit(spl["X"].tolist(), spl["min"].tolist(), deg)
        coeffsLower = np.polyfit(spl["X"].tolist(), [ ss*-1 for ss in spl["min"].tolist()] , deg) 
        YsplUpper = np.polyval(coeffsUpper, Xspl)
        YsplLower = np.polyval(coeffsLower, Xspl)

    else:
        coeffsUpper = np.polyfit(spl["X"].tolist(), spl["Upper"].tolist(), deg)
        coeffsLower = np.polyfit(spl["X"].tolist(), spl["Lower"].tolist(), deg) 
        YsplUpper = np.polyval(coeffsUpper, Xspl)
        YsplLower = np.polyval(coeffsLower, Xspl)
    
    def checkOutbounds(df,Xspl=Xspl,coeffsUpper=coeffsUpper,coeffsLower=coeffsLower,c=c):
        x=df["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) )]
        y=df["log2(%s/%s)" %( str(c[1]), str(c[0]) )]
        if y < 0:
            v=np.polyval(coeffsLower, x)
            if y < v:
                return 1
            else:
                return 0
        else:
            v=np.polyval(coeffsUpper, x)
            if y > v:
                return 1
            else:
                return 0

    df_["OutBounds"]=df_.apply(checkOutbounds,axis=1)

    if Targets:
        if title == "Transcripts":
            red=df_[df_["OutBounds"]==1][df_["transcript_id"].isin(Targets)]["transcript_id"].tolist()
            Xdata_=df_[df_["OutBounds"]==1][df_["transcript_id"].isin(Targets)]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist()
            Ydata_=df_[df_["OutBounds"]==1][df_["transcript_id"].isin(Targets)]["log2(%s/%s)" %( str(c[1]), str(c[0]) ) ].tolist()
        elif title == "Genes":
            red=df_[df_["OutBounds"]==1][df_["gene_id"].isin(Targets)]["gene_id"].tolist()
            Xdata_=df_[df_["OutBounds"]==1][df_["gene_id"].isin(Targets)]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) )].tolist()
            Ydata_=df_[df_["OutBounds"]==1][df_["gene_id"].isin(Targets)]["log2(%s/%s)" %( str(c[1]), str(c[0]) )].tolist()
    elif spec:
        if title == "Transcripts":
            red=df_[df_["transcript_id"].isin(spec)]["transcript_id"].tolist()
            Xdata_=df_[df_["transcript_id"].isin(spec)]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist()
            Ydata_=df_[df_["transcript_id"].isin(spec)]["log2(%s/%s)" %( str(c[1]), str(c[0]) ) ].tolist()
        elif title == "Genes":
            red=df_[df_["gene_id"].isin(spec)]["gene_id"].tolist()
            Xdata_=df_[df_["gene_id"].isin(spec)]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) )].tolist()
            Ydata_=df_[df_["gene_id"].isin(spec)]["log2(%s/%s)" %( str(c[1]), str(c[0]) )].tolist()   
    else:
        Xdata_=df_[df_["OutBounds"]==1]["normalized intensities (%s vs. %s)" %( str(c[0]), str(c[1]) ) ].tolist()
        Ydata_=df_[df_["OutBounds"]==1]["log2(%s/%s)" %( str(c[1]), str(c[0]) ) ].tolist()
        if title == "Transcripts":
            red=df_[df_["OutBounds"]==1]["transcript_id"].tolist()
        elif title == "Genes":
            red=df_[df_["OutBounds"]==1]["gene_id"].tolist()
        
    fig = plt.gcf()

    fig.set_size_inches(6, 6)
    plt.scatter(Xdata,Ydata, s=2)
    plt.scatter(Xdata_,Ydata_,s=sizeRed, c='r')
    if splines:
        plt.plot(Xspl,YsplUpper, "-",lw=0.5, c='g')
        plt.plot(Xspl,YsplLower,"-", lw=0.5,c='g')
        
    plt.xlabel("%s" %targetCol )
    plt.ylabel("log2(%s/%s)" %( str(c[1]), str(c[0]) ))

    if ylim:
        plt.ylim(ylim[0],ylim[1])
    else:
        ylims=max([abs(min(Ydata)), abs(max(Ydata)) ])
        plt.ylim(-ylims*1.1,ylims*1.1)
    if spMAX:
        plt.xlim(-1,spMAX)
    else:
        plt.xlim(-1,35)

    plt.title(title)
    
    plt.savefig(figName+".png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
    plt.savefig(figName+".svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')
    
    plt.show()

    return df_,red

dfGenesWoutE,redGenesOutE=MA_(dfGenes,'Genes',outFigures+'Figure18',Targets=TargetG_dif, targetCol="gene_id_count",nbins=1000,deg=2,perc=.1,eq=True, spMAX=85)#,splines=False)

dfTranscritpsWoutE,redTranscriptsOutE=MA_(dfTranscripts,'Transcripts',outFigures+'Figure19',Targets=TargetT_dif, nbins=1000,deg=2,perc=1,eq=True)#,splines=False)
2016-10-29 12:41:52.118508
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:125: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:126: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:127: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:121: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:122: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:123: UserWarning: Boolean Series key will be reindexed to match DataFrame index.

8. Biotype and gene ontology annotations

Gene ontology, is one of the best examples of the value of proper annotation and curation of gene function and cellular localisation.

For annotating our tables with gene ontology terms as well as gene biotype we will use the biomaRt package for R and ENSEMBL's biomart service. It is here important to realise that ENSEMBL releases a new version of it's biomart database with each genome release and that older database releases are transfered to archive.ensembl.org. In our case, the header of our downloaded GTF clearly indicates we are require a release matching "Ensembl 83" - dec2015.archive.ensembl.org.

Complete introduction to biomart and the construction of queries can be found elsewhere.

In [29]:
print datetime.now()
2016-10-29 12:42:05.479539
In [30]:
%%R
library("biomaRt")
host="dec2015.archive.ensembl.org"
listMarts(host=host)
               biomart               version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 83
2     ENSEMBL_MART_SNP  Ensembl Variation 83
3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 83
4    ENSEMBL_MART_VEGA               Vega 63
5                pride        PRIDE (EBI UK)
In [31]:
%%R
ensembl=useMart("ENSEMBL_MART_ENSEMBL",host=host)
listDatasets(ensembl)
                          dataset                                 description
1          oanatinus_gene_ensembl      Ornithorhynchus anatinus genes (OANA5)
2         cporcellus_gene_ensembl             Cavia porcellus genes (cavPor3)
3         gaculeatus_gene_ensembl      Gasterosteus aculeatus genes (BROADS1)
4          lafricana_gene_ensembl          Loxodonta africana genes (loxAfr3)
5  itridecemlineatus_gene_ensembl  Ictidomys tridecemlineatus genes (spetri2)
6         choffmanni_gene_ensembl         Choloepus hoffmanni genes (choHof1)
7          csavignyi_gene_ensembl              Ciona savignyi genes (CSAV2.0)
8             fcatus_gene_ensembl         Felis catus genes (Felis_catus_6.2)
9        rnorvegicus_gene_ensembl          Rattus norvegicus genes (Rnor_6.0)
10         psinensis_gene_ensembl      Pelodiscus sinensis genes (PelSin_1.0)
11          cjacchus_gene_ensembl   Callithrix jacchus genes (C_jacchus3.2.1)
12        ttruncatus_gene_ensembl          Tursiops truncatus genes (turTru1)
13       scerevisiae_gene_ensembl    Saccharomyces cerevisiae genes (R64-1-1)
14          celegans_gene_ensembl     Caenorhabditis elegans genes (WBcel235)
15          csabaeus_gene_ensembl       Chlorocebus sabaeus genes (ChlSab1.1)
16        oniloticus_gene_ensembl     Oreochromis niloticus genes (Orenil1.0)
17         trubripes_gene_ensembl           Takifugu rubripes genes (FUGU4.0)
18        amexicanus_gene_ensembl        Astyanax mexicanus genes (AstMex102)
19          pmarinus_gene_ensembl     Petromyzon marinus genes (Pmarinus_7.0)
20        eeuropaeus_gene_ensembl         Erinaceus europaeus genes (eriEur1)
21       falbicollis_gene_ensembl      Ficedula albicollis genes (FicAlb_1.4)
22      ptroglodytes_gene_ensembl          Pan troglodytes genes (CHIMP2.1.4)
23         etelfairi_gene_ensembl            Echinops telfairi genes (TENREC)
24     cintestinalis_gene_ensembl               Ciona intestinalis genes (KH)
25       nleucogenys_gene_ensembl         Nomascus leucogenys genes (Nleu1.0)
26           sscrofa_gene_ensembl              Sus scrofa genes (Sscrofa10.2)
27        ocuniculus_gene_ensembl     Oryctolagus cuniculus genes (OryCun2.0)
28     dnovemcinctus_gene_ensembl      Dasypus novemcinctus genes (Dasnov3.0)
29         pcapensis_gene_ensembl           Procavia capensis genes (proCap1)
30          tguttata_gene_ensembl     Taeniopygia guttata genes (taeGut3.2.4)
31        mlucifugus_gene_ensembl            Myotis lucifugus genes (myoLuc2)
32          hsapiens_gene_ensembl              Homo sapiens genes (GRCh38.p5)
33          pformosa_gene_ensembl       Poecilia formosa genes (PoeFor_5.1.2)
34             mfuro_gene_ensembl  Mustela putorius furo genes (MusPutFur1.0)
35        tbelangeri_gene_ensembl            Tupaia belangeri genes (tupBel1)
36           ggallus_gene_ensembl               Gallus gallus genes (Galgal4)
37       xtropicalis_gene_ensembl           Xenopus tropicalis genes (JGI4.2)
38         ecaballus_gene_ensembl              Equus caballus genes (EquCab2)
39           pabelii_gene_ensembl                  Pongo abelii genes (PPYG2)
40        xmaculatus_gene_ensembl   Xiphophorus maculatus genes (Xipmac4.4.2)
41            drerio_gene_ensembl                  Danio rerio genes (GRCz10)
42        lchalumnae_gene_ensembl         Latimeria chalumnae genes (LatCha1)
43     tnigroviridis_gene_ensembl Tetraodon nigroviridis genes (TETRAODON8.0)
44      amelanoleuca_gene_ensembl      Ailuropoda melanoleuca genes (ailMel1)
45          mmulatta_gene_ensembl               Macaca mulatta genes (MMUL_1)
46         pvampyrus_gene_ensembl           Pteropus vampyrus genes (pteVam1)
47           panubis_gene_ensembl              Papio anubis genes (PapAnu2.0)
48        mdomestica_gene_ensembl       Monodelphis domestica genes (monDom5)
49     acarolinensis_gene_ensembl       Anolis carolinensis genes (AnoCar2.0)
50            vpacos_gene_ensembl               Vicugna pacos genes (vicPac1)
51         tsyrichta_gene_ensembl            Tarsius syrichta genes (tarSyr1)
52        ogarnettii_gene_ensembl          Otolemur garnettii genes (OtoGar3)
53     dmelanogaster_gene_ensembl       Drosophila melanogaster genes (BDGP6)
54          mmurinus_gene_ensembl          Microcebus murinus genes (micMur1)
55         loculatus_gene_ensembl        Lepisosteus oculatus genes (LepOcu1)
56          olatipes_gene_ensembl                Oryzias latipes genes (HdrR)
57          ggorilla_gene_ensembl           Gorilla gorilla genes (gorGor3.1)
58         oprinceps_gene_ensembl         Ochotona princeps genes (OchPri2.0)
59            dordii_gene_ensembl             Dipodomys ordii genes (dipOrd1)
60            oaries_gene_ensembl                 Ovis aries genes (Oar_v3.1)
61         mmusculus_gene_ensembl              Mus musculus genes (GRCm38.p4)
62        mgallopavo_gene_ensembl            Meleagris gallopavo genes (UMD2)
63           gmorhua_gene_ensembl                Gadus morhua genes (gadMor1)
64    aplatyrhynchos_gene_ensembl     Anas platyrhynchos genes (BGI_duck_1.0)
65          saraneus_gene_ensembl               Sorex araneus genes (sorAra1)
66         sharrisii_gene_ensembl       Sarcophilus harrisii genes (DEVIL7.0)
67          meugenii_gene_ensembl           Macropus eugenii genes (Meug_1.0)
68           btaurus_gene_ensembl                   Bos taurus genes (UMD3.1)
69       cfamiliaris_gene_ensembl          Canis familiaris genes (CanFam3.1)
           version
1            OANA5
2          cavPor3
3          BROADS1
4          loxAfr3
5          spetri2
6          choHof1
7          CSAV2.0
8  Felis_catus_6.2
9         Rnor_6.0
10      PelSin_1.0
11  C_jacchus3.2.1
12         turTru1
13         R64-1-1
14        WBcel235
15       ChlSab1.1
16       Orenil1.0
17         FUGU4.0
18       AstMex102
19    Pmarinus_7.0
20         eriEur1
21      FicAlb_1.4
22      CHIMP2.1.4
23          TENREC
24              KH
25         Nleu1.0
26     Sscrofa10.2
27       OryCun2.0
28       Dasnov3.0
29         proCap1
30     taeGut3.2.4
31         myoLuc2
32       GRCh38.p5
33    PoeFor_5.1.2
34    MusPutFur1.0
35         tupBel1
36         Galgal4
37          JGI4.2
38         EquCab2
39           PPYG2
40     Xipmac4.4.2
41          GRCz10
42         LatCha1
43    TETRAODON8.0
44         ailMel1
45          MMUL_1
46         pteVam1
47       PapAnu2.0
48         monDom5
49       AnoCar2.0
50         vicPac1
51         tarSyr1
52         OtoGar3
53           BDGP6
54         micMur1
55         LepOcu1
56            HdrR
57       gorGor3.1
58       OchPri2.0
59         dipOrd1
60        Oar_v3.1
61       GRCm38.p4
62            UMD2
63         gadMor1
64    BGI_duck_1.0
65         sorAra1
66        DEVIL7.0
67        Meug_1.0
68          UMD3.1
69       CanFam3.1
In [32]:
%%R
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
listAttributes(ensembl)
                                                              name
1                                                  ensembl_gene_id
2                                            ensembl_transcript_id
3                                               ensembl_peptide_id
4                                                  ensembl_exon_id
5                                                      description
6                                                  chromosome_name
7                                                   start_position
8                                                     end_position
9                                                           strand
10                                                            band
11                                                transcript_start
12                                                  transcript_end
13                                        transcription_start_site
14                                               transcript_length
15                                                  transcript_tsl
16                                        transcript_gencode_basic
17                                               transcript_appris
18                                              external_gene_name
19                                            external_gene_source
20                                        external_transcript_name
21                                 external_transcript_source_name
22                                                transcript_count
23                                           percentage_gc_content
24                                                    gene_biotype
25                                              transcript_biotype
26                                                          source
27                                               transcript_source
28                                                          status
29                                               transcript_status
30                                                         version
31                                              transcript_version
32                                           phenotype_description
33                                                     source_name
34                                               study_external_id
35                                                           go_id
36                                                       name_1006
37                                                 definition_1006
38                                                 go_linkage_type
39                                                  namespace_1003
40                                            goslim_goa_accession
41                                          goslim_goa_description
42                                                    arrayexpress
43                                                          chembl
44                                   clone_based_ensembl_gene_name
45                             clone_based_ensembl_transcript_name
46                                      clone_based_vega_gene_name
47                                clone_based_vega_transcript_name
48                                                            ccds
49                                                       dbass3_id
50                                                     dbass3_name
51                                                       dbass5_id
52                                                     dbass5_name
53                                                            embl
54                                               ens_hs_transcript
55                                              ens_hs_translation
56                                                    ens_lrg_gene
57                                              ens_lrg_transcript
58                                                      entrezgene
59                                      entrezgene_transcript_name
60                                                             hpa
61                                                            ottg
62                                                            ottt
63                                                            ottp
64                                                         hgnc_id
65                                                     hgnc_symbol
66                                            hgnc_transcript_name
67                                                          merops
68                                            mim_morbid_accession
69                                          mim_morbid_description
70                                              mim_gene_accession
71                                            mim_gene_description
72                                               mirbase_accession
73                                                      mirbase_id
74                                         mirbase_transcript_name
75                                                             pdb
76                                                      protein_id
77                                                        reactome
78                                                   reactome_gene
79                                             reactome_transcript
80                                                     refseq_mrna
81                                           refseq_mrna_predicted
82                                                    refseq_ncrna
83                                          refseq_ncrna_predicted
84                                                  refseq_peptide
85                                        refseq_peptide_predicted
86                                                            rfam
87                                            rfam_transcript_name
88                                                      rnacentral
89                                                            ucsc
90                                                         unigene
91                                                         uniparc
92                                                uniprot_sptrembl
93                                               uniprot_swissprot
94                                                uniprot_genename
95                                                   wikigene_name
96                                                     wikigene_id
97                                            wikigene_description
98                               efg_agilent_sureprint_g3_ge_8x60k
99                            efg_agilent_sureprint_g3_ge_8x60k_v2
100                               efg_agilent_wholegenome_4x44k_v1
101                               efg_agilent_wholegenome_4x44k_v2
102                                                   affy_hc_g110
103                                                  affy_hg_focus
104                                            affy_hg_u133_plus_2
105                                                affy_hg_u133a_2
106                                                  affy_hg_u133a
107                                                  affy_hg_u133b
108                                                 affy_hg_u95av2
109                                                   affy_hg_u95b
110                                                   affy_hg_u95c
111                                                   affy_hg_u95d
112                                                   affy_hg_u95e
113                                                   affy_hg_u95a
114                                                  affy_hugenefl
115                                                   affy_hta_2_0
116                                            affy_huex_1_0_st_v2
117                                          affy_hugene_1_0_st_v1
118                                          affy_hugene_2_0_st_v1
119                                                 affy_primeview
120                                                  affy_u133_x3p
121                                                agilent_cgh_44b
122                                                       codelink
123                                          illumina_humanwg_6_v1
124                                          illumina_humanwg_6_v2
125                                          illumina_humanwg_6_v3
126                                         illumina_humanht_12_v3
127                                         illumina_humanht_12_v4
128                                         illumina_humanref_8_v3
129                                               phalanx_onearray
130                                                         family
131                                             family_description
132                                                          pirsf
133                                                    pirsf_start
134                                                      pirsf_end
135                                                    superfamily
136                                              superfamily_start
137                                                superfamily_end
138                                                          smart
139                                                    smart_start
140                                                      smart_end
141                                                          hamap
142                                                    hamap_start
143                                                      hamap_end
144                                                        profile
145                                                  profile_start
146                                                    profile_end
147                                                        prosite
148                                                  prosite_start
149                                                    prosite_end
150                                                         prints
151                                                   prints_start
152                                                     prints_end
153                                                           pfam
154                                                     pfam_start
155                                                       pfam_end
156                                                        tigrfam
157                                                  tigrfam_start
158                                                    tigrfam_end
159                                                         gene3d
160                                                   gene3d_start
161                                                     gene3d_end
162                                                     hmmpanther
163                                               hmmpanther_start
164                                                 hmmpanther_end
165                                                       interpro
166                                     interpro_short_description
167                                           interpro_description
168                                                 interpro_start
169                                                   interpro_end
170                                                 low_complexity
171                                           low_complexity_start
172                                             low_complexity_end
173                                           transmembrane_domain
174                                     transmembrane_domain_start
175                                       transmembrane_domain_end
176                                                  signal_domain
177                                            signal_domain_start
178                                              signal_domain_end
179                                                         ncoils
180                                                   ncoils_start
181                                                     ncoils_end
182                                                ensembl_gene_id
183                                          ensembl_transcript_id
184                                             ensembl_peptide_id
185                                                chromosome_name
186                                                 start_position
187                                                   end_position
188                                               transcript_start
189                                                 transcript_end
190                                       transcription_start_site
191                                              transcript_length
192                                                         strand
193                                             external_gene_name
194                                           external_gene_source
195                                                    5_utr_start
196                                                      5_utr_end
197                                                    3_utr_start
198                                                      3_utr_end
199                                                     cds_length
200                                               transcript_count
201                                                    description
202                                                   gene_biotype
203                                               exon_chrom_start
204                                                 exon_chrom_end
205                                                is_constitutive
206                                                           rank
207                                                          phase
208                                                      end_phase
209                                              cdna_coding_start
210                                                cdna_coding_end
211                                           genomic_coding_start
212                                             genomic_coding_end
213                                                ensembl_exon_id
214                                                      cds_start
215                                                        cds_end
216                                                ensembl_gene_id
217                                          ensembl_transcript_id
218                                             ensembl_peptide_id
219                                                chromosome_name
220                                                 start_position
221                                                   end_position
222                                                         strand
223                                                           band
224                                             external_gene_name
225                                           external_gene_source
226                                               transcript_count
227                                          percentage_gc_content
228                                                    description
229                                    vpacos_homolog_ensembl_gene
230                    vpacos_homolog_canonical_transcript_protein
231                                 vpacos_homolog_ensembl_peptide
232                                      vpacos_homolog_chromosome
233                                     vpacos_homolog_chrom_start
234                                       vpacos_homolog_chrom_end
235                                  vpacos_homolog_orthology_type
236                                         vpacos_homolog_subtype
237                            vpacos_homolog_orthology_confidence
238                                         vpacos_homolog_perc_id
239                                      vpacos_homolog_perc_id_r1
240                                  pformosa_homolog_ensembl_gene
241                  pformosa_homolog_canonical_transcript_protein
242                               pformosa_homolog_ensembl_peptide
243                                    pformosa_homolog_chromosome
244                                   pformosa_homolog_chrom_start
245                                     pformosa_homolog_chrom_end
246                                pformosa_homolog_orthology_type
247                                       pformosa_homolog_subtype
248                          pformosa_homolog_orthology_confidence
249                                       pformosa_homolog_perc_id
250                                    pformosa_homolog_perc_id_r1
251                             acarolinensis_homolog_ensembl_gene
252             acarolinensis_homolog_canonical_transcript_protein
253                          acarolinensis_homolog_ensembl_peptide
254                               acarolinensis_homolog_chromosome
255                              acarolinensis_homolog_chrom_start
256                                acarolinensis_homolog_chrom_end
257                           acarolinensis_homolog_orthology_type
258                                  acarolinensis_homolog_subtype
259                     acarolinensis_homolog_orthology_confidence
260                                  acarolinensis_homolog_perc_id
261                               acarolinensis_homolog_perc_id_r1
262                                       acarolinensis_homolog_dn
263                                       acarolinensis_homolog_ds
264                             dnovemcinctus_homolog_ensembl_gene
265             dnovemcinctus_homolog_canonical_transcript_protein
266                          dnovemcinctus_homolog_ensembl_peptide
267                               dnovemcinctus_homolog_chromosome
268                              dnovemcinctus_homolog_chrom_start
269                                dnovemcinctus_homolog_chrom_end
270                           dnovemcinctus_homolog_orthology_type
271                                  dnovemcinctus_homolog_subtype
272                     dnovemcinctus_homolog_orthology_confidence
273                                  dnovemcinctus_homolog_perc_id
274                               dnovemcinctus_homolog_perc_id_r1
275                                       dnovemcinctus_homolog_dn
276                                       dnovemcinctus_homolog_ds
277                                   gmorhua_homolog_ensembl_gene
278                   gmorhua_homolog_canonical_transcript_protein
279                                gmorhua_homolog_ensembl_protein
280                                     gmorhua_homolog_chromosome
281                                    gmorhua_homolog_chrom_start
282                                      gmorhua_homolog_chrom_end
283                                 gmorhua_homolog_orthology_type
284                                        gmorhua_homolog_subtype
285                           gmorhua_homolog_orthology_confidence
286                                        gmorhua_homolog_perc_id
287                                     gmorhua_homolog_perc_id_r1
288                                ogarnettii_homolog_ensembl_gene
289                ogarnettii_homolog_canonical_transcript_protein
290                             ogarnettii_homolog_ensembl_peptide
291                                  ogarnettii_homolog_chromosome
292                                 ogarnettii_homolog_chrom_start
293                                   ogarnettii_homolog_chrom_end
294                              ogarnettii_homolog_orthology_type
295                                     ogarnettii_homolog_subtype
296                        ogarnettii_homolog_orthology_confidence
297                                     ogarnettii_homolog_perc_id
298                                  ogarnettii_homolog_perc_id_r1
299                                          ogarnettii_homolog_dn
300                                          ogarnettii_homolog_ds
301                                   celegans_homolog_chrom_start
302                                  celegans_homolog_ensembl_gene
303                  celegans_homolog_canonical_transcript_protein
304                               celegans_homolog_ensembl_peptide
305                                    celegans_homolog_chromosome
306                                     celegans_homolog_chrom_end
307                                celegans_homolog_orthology_type
308                                       celegans_homolog_subtype
309                          celegans_homolog_orthology_confidence
310                                       celegans_homolog_perc_id
311                                    celegans_homolog_perc_id_r1
312                                            celegans_homolog_dn
313                                            celegans_homolog_ds
314                                    fcatus_homolog_ensembl_gene
315                    fcatus_homolog_canonical_transcript_protein
316                                 fcatus_homolog_ensembl_peptide
317                                      fcatus_homolog_chromosome
318                                     fcatus_homolog_chrom_start
319                                       fcatus_homolog_chrom_end
320                                  fcatus_homolog_orthology_type
321                                         fcatus_homolog_subtype
322                            fcatus_homolog_orthology_confidence
323                                         fcatus_homolog_perc_id
324                                      fcatus_homolog_perc_id_r1
325                                              fcatus_homolog_dn
326                                              fcatus_homolog_ds
327                                amexicanus_homolog_ensembl_gene
328                amexicanus_homolog_canonical_transcript_protein
329                             amexicanus_homolog_ensembl_peptide
330                                  amexicanus_homolog_chromosome
331                                 amexicanus_homolog_chrom_start
332                                   amexicanus_homolog_chrom_end
333                              amexicanus_homolog_orthology_type
334                                     amexicanus_homolog_subtype
335                        amexicanus_homolog_orthology_confidence
336                                     amexicanus_homolog_perc_id
337                                  amexicanus_homolog_perc_id_r1
338                                   ggallus_homolog_ensembl_gene
339                   ggallus_homolog_canonical_transcript_protein
340                                ggallus_homolog_ensembl_peptide
341                                     ggallus_homolog_chromosome
342                                    ggallus_homolog_chrom_start
343                                      ggallus_homolog_chrom_end
344                                 ggallus_homolog_orthology_type
345                                        ggallus_homolog_subtype
346                           ggallus_homolog_orthology_confidence
347                                        ggallus_homolog_perc_id
348                                     ggallus_homolog_perc_id_r1
349                                             ggallus_homolog_dn
350                                             ggallus_homolog_ds
351                              ptroglodytes_homolog_ensembl_gene
352              ptroglodytes_homolog_canonical_transcript_protein
353                           ptroglodytes_homolog_ensembl_peptide
354                                ptroglodytes_homolog_chromosome
355                               ptroglodytes_homolog_chrom_start
356                                 ptroglodytes_homolog_chrom_end
357                            ptroglodytes_homolog_orthology_type
358                                   ptroglodytes_homolog_subtype
359                      ptroglodytes_homolog_orthology_confidence
360                                   ptroglodytes_homolog_perc_id
361                                ptroglodytes_homolog_perc_id_r1
362                                        ptroglodytes_homolog_dn
363                                        ptroglodytes_homolog_ds
364                                 psinensis_homolog_ensembl_gene
365                 psinensis_homolog_canonical_transcript_protein
366                              psinensis_homolog_ensembl_peptide
367                                   psinensis_homolog_chromosome
368                                  psinensis_homolog_chrom_start
369                                    psinensis_homolog_chrom_end
370                               psinensis_homolog_orthology_type
371                                      psinensis_homolog_subtype
372                         psinensis_homolog_orthology_confidence
373                                      psinensis_homolog_perc_id
374                                   psinensis_homolog_perc_id_r1
375                                           psinensis_homolog_dn
376                                           psinensis_homolog_ds
377                             cintestinalis_homolog_ensembl_gene
378             cintestinalis_homolog_canonical_transcript_protein
379                          cintestinalis_homolog_ensembl_peptide
380                               cintestinalis_homolog_chromosome
381                              cintestinalis_homolog_chrom_start
382                                cintestinalis_homolog_chrom_end
383                           cintestinalis_homolog_orthology_type
384                                  cintestinalis_homolog_subtype
385                     cintestinalis_homolog_orthology_confidence
386                                  cintestinalis_homolog_perc_id
387                               cintestinalis_homolog_perc_id_r1
388                                       cintestinalis_homolog_dn
389                                       cintestinalis_homolog_ds
390                                 csavignyi_homolog_ensembl_gene
391                 csavignyi_homolog_canonical_transcript_protein
392                              csavignyi_homolog_ensembl_peptide
393                                   csavignyi_homolog_chromosome
394                                  csavignyi_homolog_chrom_start
395                                    csavignyi_homolog_chrom_end
396                               csavignyi_homolog_orthology_type
397                                      csavignyi_homolog_subtype
398                         csavignyi_homolog_orthology_confidence
399                                      csavignyi_homolog_perc_id
400                                   csavignyi_homolog_perc_id_r1
401                                           csavignyi_homolog_dn
402                                           csavignyi_homolog_ds
403                                lchalumnae_homolog_ensembl_gene
404                lchalumnae_homolog_canonical_transcript_protein
405                             lchalumnae_homolog_ensembl_peptide
406                                  lchalumnae_homolog_chromosome
407                                 lchalumnae_homolog_chrom_start
408                                   lchalumnae_homolog_chrom_end
409                              lchalumnae_homolog_orthology_type
410                                     lchalumnae_homolog_subtype
411                        lchalumnae_homolog_orthology_confidence
412                                     lchalumnae_homolog_perc_id
413                                  lchalumnae_homolog_perc_id_r1
414                                  saraneus_homolog_ensembl_gene
415                  saraneus_homolog_canonical_transcript_protein
416                               saraneus_homolog_ensembl_peptide
417                                    saraneus_homolog_chromosome
418                                   saraneus_homolog_chrom_start
419                                     saraneus_homolog_chrom_end
420                                saraneus_homolog_orthology_type
421                                       saraneus_homolog_subtype
422                          saraneus_homolog_orthology_confidence
423                                       saraneus_homolog_perc_id
424                                    saraneus_homolog_perc_id_r1
425                                   btaurus_homolog_ensembl_gene
426                   btaurus_homolog_canonical_transcript_protein
427                                btaurus_homolog_ensembl_peptide
428                                     btaurus_homolog_chromosome
429                                    btaurus_homolog_chrom_start
430                                      btaurus_homolog_chrom_end
431                                 btaurus_homolog_orthology_type
432                                        btaurus_homolog_subtype
433                           btaurus_homolog_orthology_confidence
434                                        btaurus_homolog_perc_id
435                                     btaurus_homolog_perc_id_r1
436                                             btaurus_homolog_dn
437                                             btaurus_homolog_ds
438                               cfamiliaris_homolog_ensembl_gene
439               cfamiliaris_homolog_canonical_transcript_protein
440                            cfamiliaris_homolog_ensembl_peptide
441                                 cfamiliaris_homolog_chromosome
442                                cfamiliaris_homolog_chrom_start
443                                  cfamiliaris_homolog_chrom_end
444                             cfamiliaris_homolog_orthology_type
445                                    cfamiliaris_homolog_subtype
446                       cfamiliaris_homolog_orthology_confidence
447                                    cfamiliaris_homolog_perc_id
448                                 cfamiliaris_homolog_perc_id_r1
449                                         cfamiliaris_homolog_dn
450                                         cfamiliaris_homolog_ds
451                                ttruncatus_homolog_ensembl_gene
452                ttruncatus_homolog_canonical_transcript_protein
453                             ttruncatus_homolog_ensembl_peptide
454                                  ttruncatus_homolog_chromosome
455                                 ttruncatus_homolog_chrom_start
456                                   ttruncatus_homolog_chrom_end
457                              ttruncatus_homolog_orthology_type
458                                     ttruncatus_homolog_subtype
459                        ttruncatus_homolog_orthology_confidence
460                                     ttruncatus_homolog_perc_id
461                                  ttruncatus_homolog_perc_id_r1
462                             dmelanogaster_homolog_ensembl_gene
463             dmelanogaster_homolog_canonical_transcript_protein
464                          dmelanogaster_homolog_ensembl_peptide
465                               dmelanogaster_homolog_chromosome
466                              dmelanogaster_homolog_chrom_start
467                                dmelanogaster_homolog_chrom_end
468                           dmelanogaster_homolog_orthology_type
469                                  dmelanogaster_homolog_subtype
470                     dmelanogaster_homolog_orthology_confidence
471                                  dmelanogaster_homolog_perc_id
472                               dmelanogaster_homolog_perc_id_r1
473                                       dmelanogaster_homolog_dn
474                                       dmelanogaster_homolog_ds
475                            aplatyrhynchos_homolog_ensembl_gene
476            aplatyrhynchos_homolog_canonical_transcript_protein
477                         aplatyrhynchos_homolog_ensembl_peptide
478                              aplatyrhynchos_homolog_chromosome
479                             aplatyrhynchos_homolog_chrom_start
480                               aplatyrhynchos_homolog_chrom_end
481                          aplatyrhynchos_homolog_orthology_type
482                                 aplatyrhynchos_homolog_subtype
483                    aplatyrhynchos_homolog_orthology_confidence
484                                 aplatyrhynchos_homolog_perc_id
485                              aplatyrhynchos_homolog_perc_id_r1
486                                      aplatyrhynchos_homolog_dn
487                                      aplatyrhynchos_homolog_ds
488                                 lafricana_homolog_ensembl_gene
489                 lafricana_homolog_canonical_transcript_protein
490                              lafricana_homolog_ensembl_peptide
491                                   lafricana_homolog_chromosome
492                                  lafricana_homolog_chrom_start
493                                    lafricana_homolog_chrom_end
494                               lafricana_homolog_orthology_type
495                                      lafricana_homolog_subtype
496                         lafricana_homolog_orthology_confidence
497                                      lafricana_homolog_perc_id
498                                   lafricana_homolog_perc_id_r1
499                                           lafricana_homolog_dn
500                                           lafricana_homolog_ds
501                                     mfuro_homolog_ensembl_gene
502                     mfuro_homolog_canonical_transcript_protein
503                                  mfuro_homolog_ensembl_peptide
504                                       mfuro_homolog_chromosome
505                                      mfuro_homolog_chrom_start
506                                        mfuro_homolog_chrom_end
507                                   mfuro_homolog_orthology_type
508                                          mfuro_homolog_subtype
509                             mfuro_homolog_orthology_confidence
510                                          mfuro_homolog_perc_id
511                                       mfuro_homolog_perc_id_r1
512                                               mfuro_homolog_dn
513                                               mfuro_homolog_ds
514                               falbicollis_homolog_ensembl_gene
515               falbicollis_homolog_canonical_transcript_protein
516                            falbicollis_homolog_ensembl_peptide
517                                 falbicollis_homolog_chromosome
518                                falbicollis_homolog_chrom_start
519                                  falbicollis_homolog_chrom_end
520                             falbicollis_homolog_orthology_type
521                                    falbicollis_homolog_subtype
522                       falbicollis_homolog_orthology_confidence
523                                    falbicollis_homolog_perc_id
524                                 falbicollis_homolog_perc_id_r1
525                                         falbicollis_homolog_dn
526                                         falbicollis_homolog_ds
527                                 trubripes_homolog_ensembl_gene
528                 trubripes_homolog_canonical_transcript_protein
529                              trubripes_homolog_ensembl_peptide
530                                   trubripes_homolog_chromosome
531                                  trubripes_homolog_chrom_start
532                                    trubripes_homolog_chrom_end
533                               trubripes_homolog_orthology_type
534                                      trubripes_homolog_subtype
535                         trubripes_homolog_orthology_confidence
536                                      trubripes_homolog_perc_id
537                                   trubripes_homolog_perc_id_r1
538                                           trubripes_homolog_dn
539                                           trubripes_homolog_ds
540                               nleucogenys_homolog_ensembl_gene
541               nleucogenys_homolog_canonical_transcript_protein
542                            nleucogenys_homolog_ensembl_peptide
543                                 nleucogenys_homolog_chromosome
544                                nleucogenys_homolog_chrom_start
545                                  nleucogenys_homolog_chrom_end
546                             nleucogenys_homolog_orthology_type
547                                    nleucogenys_homolog_subtype
548                       nleucogenys_homolog_orthology_confidence
549                                    nleucogenys_homolog_perc_id
550                                 nleucogenys_homolog_perc_id_r1
551                                         nleucogenys_homolog_dn
552                                         nleucogenys_homolog_ds
553                                  ggorilla_homolog_ensembl_gene
554                  ggorilla_homolog_canomical_transcript_protein
555                               ggorilla_homolog_ensembl_peptide
556                                    ggorilla_homolog_chromosome
557                                   ggorilla_homolog_chrom_start
558                                     ggorilla_homolog_chrom_end
559                                ggorilla_homolog_orthology_type
560                                       ggorilla_homolog_subtype
561                          ggorilla_homolog_orthology_confidence
562                                       ggorilla_homolog_perc_id
563                                    ggorilla_homolog_perc_id_r1
564                                            ggorilla_homolog_dn
565                                            ggorilla_homolog_ds
566                                cporcellus_homolog_ensembl_gene
567                cporcellus_homolog_canonical_transcript_protein
568                             cporcellus_homolog_ensembl_peptide
569                                  cporcellus_homolog_chromosome
570                                 cporcellus_homolog_chrom_start
571                                   cporcellus_homolog_chrom_end
572                              cporcellus_homolog_orthology_type
573                                     cporcellus_homolog_subtype
574                        cporcellus_homolog_orthology_confidence
575                                     cporcellus_homolog_perc_id
576                                  cporcellus_homolog_perc_id_r1
577                                          cporcellus_homolog_dn
578                                          cporcellus_homolog_ds
579                                eeuropaeus_homolog_ensembl_gene
580                eeuropaeus_homolog_canonical_transcript_protein
581                             eeuropaeus_homolog_ensembl_peptide
582                                  eeuropaeus_homolog_chromosome
583                                 eeuropaeus_homolog_chrom_start
584                                   eeuropaeus_homolog_chrom_end
585                              eeuropaeus_homolog_orthology_type
586                                     eeuropaeus_homolog_subtype
587                        eeuropaeus_homolog_orthology_confidence
588                                     eeuropaeus_homolog_perc_id
589                                  eeuropaeus_homolog_perc_id_r1
590                                 ecaballus_homolog_ensembl_gene
591                 ecaballus_homolog_canonical_transcript_protein
592                              ecaballus_homolog_ensembl_peptide
593                                   ecaballus_homolog_chromosome
594                                  ecaballus_homolog_chrom_start
595                                    ecaballus_homolog_chrom_end
596                               ecaballus_homolog_orthology_type
597                                      ecaballus_homolog_subtype
598                         ecaballus_homolog_orthology_confidence
599                                      ecaballus_homolog_perc_id
600                                   ecaballus_homolog_perc_id_r1
601                                           ecaballus_homolog_dn
602                                           ecaballus_homolog_ds
603                                    dordii_homolog_ensembl_gene
604                    dordii_homolog_canonical_transcript_protein
605                                 dordii_homolog_ensembl_peptide
606                                      dordii_homolog_chromosome
607                                     dordii_homolog_chrom_start
608                                       dordii_homolog_chrom_end
609                                  dordii_homolog_orthology_type
610                                         dordii_homolog_subtype
611                            dordii_homolog_orthology_confidence
612                                         dordii_homolog_perc_id
613                                      dordii_homolog_perc_id_r1
614                                  pmarinus_homolog_ensembl_gene
615                  pmarinus_homolog_canonical_transcript_protein
616                               pmarinus_homolog_ensembl_peptide
617                                    pmarinus_homolog_chromosome
618                                   pmarinus_homolog_chrom_start
619                                     pmarinus_homolog_chrom_end
620                                pmarinus_homolog_orthology_type
621                                       pmarinus_homolog_subtype
622                          pmarinus_homolog_orthology_confidence
623                                       pmarinus_homolog_perc_id
624                                    pmarinus_homolog_perc_id_r1
625                                 etelfairi_homolog_ensembl_gene
626                 etelfairi_homolog_canonical_transcript_protein
627                              etelfairi_homolog_ensembl_peptide
628                                   etelfairi_homolog_chromosome
629                                  etelfairi_homolog_chrom_start
630                                    etelfairi_homolog_chrom_end
631                               etelfairi_homolog_orthology_type
632                                      etelfairi_homolog_subtype
633                         etelfairi_homolog_orthology_confidence
634                                      etelfairi_homolog_perc_id
635                                   etelfairi_homolog_perc_id_r1
636                                  mmulatta_homolog_ensembl_gene
637                  mmulatta_homolog_canonical_transcript_protein
638                               mmulatta_homolog_ensembl_peptide
639                                    mmulatta_homolog_chromosome
640                                   mmulatta_homolog_chrom_start
641                                     mmulatta_homolog_chrom_end
642                                mmulatta_homolog_orthology_type
643                                       mmulatta_homolog_subtype
644                          mmulatta_homolog_orthology_confidence
645                                       mmulatta_homolog_perc_id
646                                    mmulatta_homolog_perc_id_r1
647                                            mmulatta_homolog_dn
648                                            mmulatta_homolog_ds
649                                  cjacchus_homolog_ensembl_gene
650                  cjacchus_homolog_canonical_transcript_protein
651                               cjacchus_homolog_ensembl_peptide
652                                    cjacchus_homolog_chromosome
653                                   cjacchus_homolog_chrom_start
654                                     cjacchus_homolog_chrom_end
655                                cjacchus_homolog_orthology_type
656                                       cjacchus_homolog_subtype
657                          cjacchus_homolog_orthology_confidence
658                                       cjacchus_homolog_perc_id
659                                    cjacchus_homolog_perc_id_r1
660                                            cjacchus_homolog_dn
661                                            cjacchus_homolog_ds
662                                  olatipes_homolog_ensembl_gene
663                  olatipes_homolog_canonical_transcript_protein
664                               olatipes_homolog_ensembl_peptide
665                                    olatipes_homolog_chromosome
666                                   olatipes_homolog_chrom_start
667                                     olatipes_homolog_chrom_end
668                                olatipes_homolog_orthology_type
669                                       olatipes_homolog_subtype
670                          olatipes_homolog_orthology_confidence
671                                       olatipes_homolog_perc_id
672                                    olatipes_homolog_perc_id_r1
673                                            olatipes_homolog_dn
674                                            olatipes_homolog_ds
675                                 pvampyrus_homolog_ensembl_gene
676                 pvampyrus_homolog_canonical_transcript_protein
677                              pvampyrus_homolog_ensembl_peptide
678                                   pvampyrus_homolog_chromosome
679                                  pvampyrus_homolog_chrom_start
680                                    pvampyrus_homolog_chrom_end
681                               pvampyrus_homolog_orthology_type
682                                      pvampyrus_homolog_subtype
683                         pvampyrus_homolog_orthology_confidence
684                                      pvampyrus_homolog_perc_id
685                                   pvampyrus_homolog_perc_id_r1
686                                mlucifugus_homolog_ensembl_gene
687                mlucifugus_homolog_canonical_transcript_protein
688                             mlucifugus_homolog_ensembl_peptide
689                                  mlucifugus_homolog_chromosome
690                                 mlucifugus_homolog_chrom_start
691                                   mlucifugus_homolog_chrom_end
692                              mlucifugus_homolog_orthology_type
693                                     mlucifugus_homolog_subtype
694                        mlucifugus_homolog_orthology_confidence
695                                     mlucifugus_homolog_perc_id
696                                  mlucifugus_homolog_perc_id_r1
697                                          mlucifugus_homolog_dn
698                                          mlucifugus_homolog_ds
699                                 mmusculus_homolog_ensembl_gene
700                 mmusculus_homolog_canonical_transcript_protein
701                              mmusculus_homolog_ensembl_peptide
702                                   mmusculus_homolog_chromosome
703                                  mmusculus_homolog_chrom_start
704                                    mmusculus_homolog_chrom_end
705                               mmusculus_homolog_orthology_type
706                                      mmusculus_homolog_subtype
707                         mmusculus_homolog_orthology_confidence
708                                      mmusculus_homolog_perc_id
709                                   mmusculus_homolog_perc_id_r1
710                                           mmusculus_homolog_dn
711                                           mmusculus_homolog_ds
712                                  mmurinus_homolog_ensembl_gene
713                  mmurinus_homolog_canonical_transcript_protein
714                               mmurinus_homolog_ensembl_peptide
715                                    mmurinus_homolog_chromosome
716                                   mmurinus_homolog_chrom_start
717                                     mmurinus_homolog_chrom_end
718                                mmurinus_homolog_orthology_type
719                                       mmurinus_homolog_subtype
720                          mmurinus_homolog_orthology_confidence
721                                       mmurinus_homolog_perc_id
722                                    mmurinus_homolog_perc_id_r1
723                                oniloticus_homolog_ensembl_gene
724                oniloticus_homolog_canonical_transcript_protein
725                             oniloticus_homolog_ensembl_peptide
726                                  oniloticus_homolog_chromosome
727                                 oniloticus_homolog_chrom_start
728                                   oniloticus_homolog_chrom_end
729                              oniloticus_homolog_orthology_type
730                                     oniloticus_homolog_subtype
731                        oniloticus_homolog_orthology_confidence
732                                     oniloticus_homolog_perc_id
733                                  oniloticus_homolog_perc_id_r1
734                                   panubis_homolog_ensembl_gene
735                   panubis_homolog_canonical_transcript_protein
736                                panubis_homolog_ensembl_peptide
737                                     panubis_homolog_chromosome
738                                    panubis_homolog_chrom_start
739                                      panubis_homolog_chrom_end
740                                 panubis_homolog_orthology_type
741                                        panubis_homolog_subtype
742                           panubis_homolog_orthology_confidence
743                                        panubis_homolog_perc_id
744                                     panubis_homolog_perc_id_r1
745                                             panubis_homolog_dn
746                                             panubis_homolog_ds
747                                mdomestica_homolog_ensembl_gene
748                mdomestica_homolog_canonical_transcript_protein
749                             mdomestica_homolog_ensembl_peptide
750                                  mdomestica_homolog_chromosome
751                                 mdomestica_homolog_chrom_start
752                                   mdomestica_homolog_chrom_end
753                              mdomestica_homolog_orthology_type
754                                     mdomestica_homolog_subtype
755                        mdomestica_homolog_orthology_confidence
756                                     mdomestica_homolog_perc_id
757                                  mdomestica_homolog_perc_id_r1
758                                          mdomestica_homolog_dn
759                                          mdomestica_homolog_ds
760                                   pabelii_homolog_ensembl_gene
761                   pabelii_homolog_canonical_transcript_protein
762                                pabelii_homolog_ensembl_peptide
763                                     pabelii_homolog_chromosome
764                                    pabelii_homolog_chrom_start
765                                      pabelii_homolog_chrom_end
766                                 pabelii_homolog_orthology_type
767                                        pabelii_homolog_subtype
768                           pabelii_homolog_orthology_confidence
769                                        pabelii_homolog_perc_id
770                                     pabelii_homolog_perc_id_r1
771                                             pabelii_homolog_dn
772                                             pabelii_homolog_ds
773                              amelanoleuca_homolog_ensembl_gene
774              amelanoleuca_homolog_canonical_transcript_protein
775                           amelanoleuca_homolog_ensembl_peptide
776                                amelanoleuca_homolog_chromosome
777                               amelanoleuca_homolog_chrom_start
778                                 amelanoleuca_homolog_chrom_end
779                            amelanoleuca_homolog_orthology_type
780                                   amelanoleuca_homolog_subtype
781                      amelanoleuca_homolog_orthology_confidence
782                                   amelanoleuca_homolog_perc_id
783                                amelanoleuca_homolog_perc_id_r1
784                                        amelanoleuca_homolog_dn
785                                        amelanoleuca_homolog_ds
786                                   sscrofa_homolog_ensembl_gene
787                   sscrofa_homolog_canonical_transcript_protein
788                                sscrofa_homolog_ensembl_peptide
789                                     sscrofa_homolog_chromosome
790                                    sscrofa_homolog_chrom_start
791                                      sscrofa_homolog_chrom_end
792                                 sscrofa_homolog_orthology_type
793                                        sscrofa_homolog_subtype
794                           sscrofa_homolog_orthology_confidence
795                                        sscrofa_homolog_perc_id
796                                     sscrofa_homolog_perc_id_r1
797                                             sscrofa_homolog_dn
798                                             sscrofa_homolog_ds
799                                 oprinceps_homolog_ensembl_gene
800                 oprinceps_homolog_canonical_transcript_protein
801                              oprinceps_homolog_ensembl_peptide
802                                   oprinceps_homolog_chromosome
803                                  oprinceps_homolog_chrom_start
804                                    oprinceps_homolog_chrom_end
805                               oprinceps_homolog_orthology_type
806                                      oprinceps_homolog_subtype
807                         oprinceps_homolog_orthology_confidence
808                                      oprinceps_homolog_perc_id
809                                   oprinceps_homolog_perc_id_r1
810                                xmaculatus_homolog_ensembl_gene
811                xmaculatus_homolog_canonical_transcript_protein
812                             xmaculatus_homolog_ensembl_peptide
813                                  xmaculatus_homolog_chromosome
814                                 xmaculatus_homolog_chrom_start
815                                   xmaculatus_homolog_chrom_end
816                              xmaculatus_homolog_orthology_type
817                                     xmaculatus_homolog_subtype
818                        xmaculatus_homolog_orthology_confidence
819                                     xmaculatus_homolog_perc_id
820                                  xmaculatus_homolog_perc_id_r1
821                                 oanatinus_homolog_ensembl_gene
822                 oanatinus_homolog_canonical_transcript_protein
823                              oanatinus_homolog_ensembl_peptide
824                                   oanatinus_homolog_chromosome
825                                  oanatinus_homolog_chrom_start
826                                    oanatinus_homolog_chrom_end
827                               oanatinus_homolog_orthology_type
828                                      oanatinus_homolog_subtype
829                         oanatinus_homolog_orthology_confidence
830                                      oanatinus_homolog_perc_id
831                                   oanatinus_homolog_perc_id_r1
832                                           oanatinus_homolog_dn
833                                           oanatinus_homolog_ds
834                                ocuniculus_homolog_ensembl_gene
835                ocuniculus_homolog_canonical_transcript_protein
836                             ocuniculus_homolog_ensembl_peptide
837                                  ocuniculus_homolog_chromosome
838                                 ocuniculus_homolog_chrom_start
839                                   ocuniculus_homolog_chrom_end
840                              ocuniculus_homolog_orthology_type
841                                     ocuniculus_homolog_subtype
842                        ocuniculus_homolog_orthology_confidence
843                                     ocuniculus_homolog_perc_id
844                                  ocuniculus_homolog_perc_id_r1
845                                          ocuniculus_homolog_dn
846                                          ocuniculus_homolog_ds
847                               rnorvegicus_homolog_ensembl_gene
848               rnorvegicus_homolog_canonical_transcript_protein
849                            rnorvegicus_homolog_ensembl_peptide
850                                 rnorvegicus_homolog_chromosome
851                                rnorvegicus_homolog_chrom_start
852                                  rnorvegicus_homolog_chrom_end
853                             rnorvegicus_homolog_orthology_type
854                                    rnorvegicus_homolog_subtype
855                       rnorvegicus_homolog_orthology_confidence
856                                    rnorvegicus_homolog_perc_id
857                                 rnorvegicus_homolog_perc_id_r1
858                                         rnorvegicus_homolog_dn
859                                         rnorvegicus_homolog_ds
860                                 pcapensis_homolog_ensembl_gene
861                 pcapensis_homolog_canonical_transcript_protein
862                              pcapensis_homolog_ensembl_peptide
863                                   pcapensis_homolog_chromosome
864                                  pcapensis_homolog_chrom_start
865                                    pcapensis_homolog_chrom_end
866                               pcapensis_homolog_orthology_type
867                                      pcapensis_homolog_subtype
868                         pcapensis_homolog_orthology_confidence
869                                      pcapensis_homolog_perc_id
870                                   pcapensis_homolog_perc_id_r1
871                                    oaries_homolog_ensembl_gene
872                    oaries_homolog_canonical_transcript_protein
873                                 oaries_homolog_ensembl_peptide
874                                      oaries_homolog_chromosome
875                                     oaries_homolog_chrom_start
876                                       oaries_homolog_chrom_end
877                                  oaries_homolog_orthology_type
878                                         oaries_homolog_subtype
879                            oaries_homolog_orthology_confidence
880                                         oaries_homolog_perc_id
881                                      oaries_homolog_perc_id_r1
882                                              oaries_homolog_dn
883                                              oaries_homolog_ds
884                                choffmanni_homolog_ensembl_gene
885                choffmanni_homolog_canonical_transcript_protein
886                             choffmanni_homolog_ensembl_peptide
887                                  choffmanni_homolog_chromosome
888                                 choffmanni_homolog_chrom_start
889                                   choffmanni_homolog_chrom_end
890                              choffmanni_homolog_orthology_type
891                                     choffmanni_homolog_subtype
892                        choffmanni_homolog_orthology_confidence
893                                     choffmanni_homolog_perc_id
894                                  choffmanni_homolog_perc_id_r1
895                                 loculatus_homolog_ensembl_gene
896                 loculatus_homolog_canonical_transcript_protein
897                              loculatus_homolog_ensembl_peptide
898                                   loculatus_homolog_chromosome
899                                  loculatus_homolog_chrom_start
900                                    loculatus_homolog_chrom_end
901                               loculatus_homolog_orthology_type
902                                      loculatus_homolog_subtype
903                         loculatus_homolog_orthology_confidence
904                                      loculatus_homolog_perc_id
905                                   loculatus_homolog_perc_id_r1
906                         itridecemlineatus_homolog_ensembl_gene
907         itridecemlineatus_homolog_canonical_transcript_protein
908                      itridecemlineatus_homolog_ensembl_peptide
909                           itridecemlineatus_homolog_chromosome
910                          itridecemlineatus_homolog_chrom_start
911                            itridecemlineatus_homolog_chrom_end
912                       itridecemlineatus_homolog_orthology_type
913                              itridecemlineatus_homolog_subtype
914                 itridecemlineatus_homolog_orthology_confidence
915                              itridecemlineatus_homolog_perc_id
916                           itridecemlineatus_homolog_perc_id_r1
917                                   itridecemlineatus_homolog_dn
918                                   itridecemlineatus_homolog_ds
919                                gaculeatus_homolog_ensembl_gene
920                gaculeatus_homolog_canonical_transcript_protein
921                             gaculeatus_homolog_ensembl_peptide
922                                  gaculeatus_homolog_chromosome
923                                 gaculeatus_homolog_chrom_start
924                                   gaculeatus_homolog_chrom_end
925                              gaculeatus_homolog_orthology_type
926                                     gaculeatus_homolog_subtype
927                        gaculeatus_homolog_orthology_confidence
928                                     gaculeatus_homolog_perc_id
929                                  gaculeatus_homolog_perc_id_r1
930                                          gaculeatus_homolog_dn
931                                          gaculeatus_homolog_ds
932                                 tsyrichta_homolog_ensembl_gene
933                 tsyrichta_homolog_canonical_transcript_protein
934                              tsyrichta_homolog_ensembl_peptide
935                                   tsyrichta_homolog_chromosome
936                                  tsyrichta_homolog_chrom_start
937                                    tsyrichta_homolog_chrom_end
938                               tsyrichta_homolog_orthology_type
939                                      tsyrichta_homolog_subtype
940                         tsyrichta_homolog_orthology_confidence
941                                      tsyrichta_homolog_perc_id
942                                   tsyrichta_homolog_perc_id_r1
943                                 sharrisii_homolog_ensembl_gene
944                 sharrisii_homolog_canonical_transcript_protein
945                              sharrisii_homolog_ensembl_peptide
946                                   sharrisii_homolog_chromosome
947                                  sharrisii_homolog_chrom_start
948                                    sharrisii_homolog_chrom_end
949                               sharrisii_homolog_orthology_type
950                                      sharrisii_homolog_subtype
951                         sharrisii_homolog_orthology_confidence
952                                      sharrisii_homolog_perc_id
953                                   sharrisii_homolog_perc_id_r1
954                                           sharrisii_homolog_dn
955                                           sharrisii_homolog_ds
956                             tnigroviridis_homolog_ensembl_gene
957             tnigroviridis_homolog_canonical_transcript_protein
958                          tnigroviridis_homolog_ensembl_peptide
959                               tnigroviridis_homolog_chromosome
960                              tnigroviridis_homolog_chrom_start
961                                tnigroviridis_homolog_chrom_end
962                           tnigroviridis_homolog_orthology_type
963                                  tnigroviridis_homolog_subtype
964                     tnigroviridis_homolog_orthology_confidence
965                                  tnigroviridis_homolog_perc_id
966                               tnigroviridis_homolog_perc_id_r1
967                                       tnigroviridis_homolog_dn
968                                       tnigroviridis_homolog_ds
969                                tbelangeri_homolog_ensembl_gene
970                tbelangeri_homolog_canonical_transcript_protein
971                             tbelangeri_homolog_ensembl_peptide
972                                  tbelangeri_homolog_chromosome
973                                 tbelangeri_homolog_chrom_start
974                                   tbelangeri_homolog_chrom_end
975                              tbelangeri_homolog_orthology_type
976                                     tbelangeri_homolog_subtype
977                        tbelangeri_homolog_orthology_confidence
978                                     tbelangeri_homolog_perc_id
979                                  tbelangeri_homolog_perc_id_r1
980                                mgallopavo_homolog_ensembl_gene
981                mgallopavo_homolog_canonical_transcript_protein
982                             mgallopavo_homolog_ensembl_peptide
983                                  mgallopavo_homolog_chromosome
984                                 mgallopavo_homolog_chrom_start
985                                   mgallopavo_homolog_chrom_end
986                              mgallopavo_homolog_orthology_type
987                                     mgallopavo_homolog_subtype
988                        mgallopavo_homolog_orthology_confidence
989                                     mgallopavo_homolog_perc_id
990                                  mgallopavo_homolog_perc_id_r1
991                                          mgallopavo_homolog_dn
992                                          mgallopavo_homolog_ds
993                                  csabaeus_homolog_ensembl_gene
994                  csabaeus_homolog_canonical_transcript_protein
995                               csabaeus_homolog_ensembl_peptide
996                                    csabaeus_homolog_chromosome
997                                   csabaeus_homolog_chrom_start
998                                     csabaeus_homolog_chrom_end
999                                csabaeus_homolog_orthology_type
1000                                      csabaeus_homolog_subtype
1001                         csabaeus_homolog_orthology_confidence
1002                                      csabaeus_homolog_perc_id
1003                                   csabaeus_homolog_perc_id_r1
1004                                           csabaeus_homolog_dn
1005                                           csabaeus_homolog_ds
1006                                 meugenii_homolog_ensembl_gene
1007                 meugenii_homolog_canonical_transcript_protein
1008                              meugenii_homolog_ensembl_peptide
1009                                   meugenii_homolog_chromosome
1010                                  meugenii_homolog_chrom_start
1011                                    meugenii_homolog_chrom_end
1012                               meugenii_homolog_orthology_type
1013                                      meugenii_homolog_subtype
1014                         meugenii_homolog_orthology_confidence
1015                                      meugenii_homolog_perc_id
1016                                   meugenii_homolog_perc_id_r1
1017                              xtropicalis_homolog_ensembl_gene
1018              xtropicalis_homolog_canonical_transcript_protein
1019                           xtropicalis_homolog_ensembl_peptide
1020                                xtropicalis_homolog_chromosome
1021                               xtropicalis_homolog_chrom_start
1022                                 xtropicalis_homolog_chrom_end
1023                            xtropicalis_homolog_orthology_type
1024                                   xtropicalis_homolog_subtype
1025                      xtropicalis_homolog_orthology_confidence
1026                                   xtropicalis_homolog_perc_id
1027                                xtropicalis_homolog_perc_id_r1
1028                                        xtropicalis_homolog_dn
1029                                        xtropicalis_homolog_ds
1030                              scerevisiae_homolog_ensembl_gene
1031              scerevisiae_homolog_canonical_transcript_protein
1032                           scerevisiae_homolog_ensembl_peptide
1033                                scerevisiae_homolog_chromosome
1034                               scerevisiae_homolog_chrom_start
1035                                 scerevisiae_homolog_chrom_end
1036                            scerevisiae_homolog_orthology_type
1037                                   scerevisiae_homolog_subtype
1038                      scerevisiae_homolog_orthology_confidence
1039                                   scerevisiae_homolog_perc_id
1040                                scerevisiae_homolog_perc_id_r1
1041                                        scerevisiae_homolog_dn
1042                                        scerevisiae_homolog_ds
1043                                 tguttata_homolog_ensembl_gene
1044                 tguttata_homolog_canonical_transcript_protein
1045                              tguttata_homolog_ensembl_peptide
1046                                   tguttata_homolog_chromosome
1047                                  tguttata_homolog_chrom_start
1048                                    tguttata_homolog_chrom_end
1049                               tguttata_homolog_orthology_type
1050                                      tguttata_homolog_subtype
1051                         tguttata_homolog_orthology_confidence
1052                                      tguttata_homolog_perc_id
1053                                   tguttata_homolog_perc_id_r1
1054                                           tguttata_homolog_dn
1055                                           tguttata_homolog_ds
1056                                   drerio_homolog_ensembl_gene
1057                   drerio_homolog_canonical_transcript_protein
1058                                drerio_homolog_ensembl_peptide
1059                                     drerio_homolog_chromosome
1060                                    drerio_homolog_chrom_start
1061                                      drerio_homolog_chrom_end
1062                                 drerio_homolog_orthology_type
1063                                        drerio_homolog_subtype
1064                           drerio_homolog_orthology_confidence
1065                                        drerio_homolog_perc_id
1066                                     drerio_homolog_perc_id_r1
1067                                             drerio_homolog_dn
1068                                             drerio_homolog_ds
1069                                 hsapiens_paralog_ensembl_gene
1070                 hsapiens_paralog_canonical_transcript_protein
1071                              hsapiens_paralog_ensembl_peptide
1072                                   hsapiens_paralog_chromosome
1073                                  hsapiens_paralog_chrom_start
1074                                    hsapiens_paralog_chrom_end
1075                               hsapiens_paralog_orthology_type
1076                                      hsapiens_paralog_subtype
1077                          hsapiens_paralog_paralogy_confidence
1078                                      hsapiens_paralog_perc_id
1079                                   hsapiens_paralog_perc_id_r1
1080                                           hsapiens_paralog_dn
1081                                           hsapiens_paralog_ds
1082                                               ensembl_gene_id
1083                                         ensembl_transcript_id
1084                                            ensembl_peptide_id
1085                                               chromosome_name
1086                                                start_position
1087                                                  end_position
1088                                                        strand
1089                                                          band
1090                                            external_gene_name
1091                                          external_gene_source
1092                                              transcript_count
1093                                         percentage_gc_content
1094                                                   description
1095                                                variation_name
1096                                    germ_line_variation_source
1097                                            source_description
1098                                                        allele
1099                                                     validated
1100                                                     mapweight
1101                                                  minor_allele
1102                                             minor_allele_freq
1103                                            minor_allele_count
1104                                         clinical_significance
1105                                           transcript_location
1106                                         snp_chromosome_strand
1107                                              peptide_location
1108                                              chromosome_start
1109                                                chromosome_end
1110                                      polyphen_prediction_2076
1111                                           polyphen_score_2076
1112                                          sift_prediction_2076
1113                                               sift_score_2076
1114                                   distance_to_transcript_2076
1115                                                cds_start_2076
1116                                                  cds_end_2076
1117                                                 peptide_shift
1118                                             synonymous_status
1119                                            allele_string_2076
1120                                               ensembl_gene_id
1121                                         ensembl_transcript_id
1122                                            ensembl_peptide_id
1123                                               chromosome_name
1124                                                start_position
1125                                                  end_position
1126                                                        strand
1127                                                          band
1128                                            external_gene_name
1129                                          external_gene_source
1130                                              transcript_count
1131                                         percentage_gc_content
1132                                                   description
1133                                        somatic_variation_name
1134                                           somatic_source_name
1135                                    somatic_source_description
1136                                                somatic_allele
1137                                             somatic_validated
1138                                             somatic_mapweight
1139                                          somatic_minor_allele
1140                                     somatic_minor_allele_freq
1141                                    somatic_minor_allele_count
1142                                 somatic_clinical_significance
1143                                   somatic_transcript_location
1144                                 somatic_snp_chromosome_strand
1145                                      somatic_peptide_location
1146                                      somatic_chromosome_start
1147                                        somatic_chromosome_end
1148    mart_transcript_variation_som__dm_polyphen_prediction_2076
1149         mart_transcript_variation_som__dm_polyphen_score_2076
1150        mart_transcript_variation_som__dm_sift_prediction_2076
1151             mart_transcript_variation_som__dm_sift_score_2076
1152 mart_transcript_variation_som__dm_distance_to_transcript_2076
1153                                        somatic_cds_start_2076
1154                                          somatic_cds_end_2076
1155                                         somatic_peptide_shift
1156                                     somatic_synonymous_status
1157          mart_transcript_variation_som__dm_allele_string_2076
1158                                        transcript_exon_intron
1159                                              gene_exon_intron
1160                                              transcript_flank
1161                                                    gene_flank
1162                                       coding_transcript_flank
1163                                             coding_gene_flank
1164                                                          5utr
1165                                                          3utr
1166                                                     gene_exon
1167                                                          cdna
1168                                                        coding
1169                                                       peptide
1170                                                upstream_flank
1171                                              downstream_flank
1172                                               ensembl_gene_id
1173                                                   description
1174                                            external_gene_name
1175                                          external_gene_source
1176                                               chromosome_name
1177                                                start_position
1178                                                  end_position
1179                                                  gene_biotype
1180                                                        family
1181                                             cdna_coding_start
1182                                               cdna_coding_end
1183                                                   5_utr_start
1184                                                     5_utr_end
1185                                                   3_utr_start
1186                                                     3_utr_end
1187                                         ensembl_transcript_id
1188                                            ensembl_peptide_id
1189                                            transcript_biotype
1190                                                        strand
1191                                              transcript_start
1192                                                transcript_end
1193                                      transcription_start_site
1194                                             transcript_length
1195                                                    cds_length
1196                                                     cds_start
1197                                                       cds_end
1198                                               ensembl_exon_id
1199                                              exon_chrom_start
1200                                                exon_chrom_end
1201                                                        strand
1202                                                          rank
1203                                                         phase
1204                                             cdna_coding_start
1205                                               cdna_coding_end
1206                                          genomic_coding_start
1207                                            genomic_coding_end
1208                                               is_constitutive
                                                  description
1                                             Ensembl Gene ID
2                                       Ensembl Transcript ID
3                                          Ensembl Protein ID
4                                             Ensembl Exon ID
5                                                 Description
6                                             Chromosome Name
7                                             Gene Start (bp)
8                                               Gene End (bp)
9                                                      Strand
10                                                       Band
11                                      Transcript Start (bp)
12                                        Transcript End (bp)
13                             Transcription Start Site (TSS)
14                 Transcript length (including UTRs and CDS)
15                             Transcript Support Level (TSL)
16                                   GENCODE basic annotation
17                                          APPRIS annotation
18                                       Associated Gene Name
19                                     Associated Gene Source
20                                 Associated Transcript Name
21                               Associated Transcript Source
22                                           Transcript count
23                                               % GC content
24                                                  Gene type
25                                            Transcript type
26                                              Source (gene)
27                                        Source (transcript)
28                                              Status (gene)
29                                        Status (transcript)
30                                             Version (gene)
31                                       Version (transcript)
32                                      Phenotype description
33                                                Source name
34                                   Study External Reference
35                                          GO Term Accession
36                                               GO Term Name
37                                         GO Term Definition
38                                      GO Term Evidence Code
39                                                  GO domain
40                                    GOSlim GOA Accession(s)
41                                     GOSlim GOA Description
42                                               ArrayExpress
43                                               ChEMBL ID(s)
44                              Clone based Ensembl gene name
45                        Clone based Ensembl transcript name
46                                 Clone based VEGA gene name
47                           Clone based VEGA transcript name
48                                                    CCDS ID
49          Database of Aberrant 3' Splice Sites (DBASS3) IDs
50                                           DBASS3 Gene Name
51          Database of Aberrant 5' Splice Sites (DBASS5) IDs
52                                           DBASS5 Gene Name
53                                          EMBL (Genbank) ID
54                               Ensembl Human Transcript IDs
55                              Ensembl Human Translation IDs
56                                   LRG to Ensembl link gene
57                             LRG to Ensembl link transcript
58                                              EntrezGene ID
59                              EntrezGene transcript name ID
60                            Human Protein Atlas Antibody ID
61                                     VEGA gene ID(s) (OTTG)
62                               VEGA transcript ID(s) (OTTT)
63                                  VEGA protein ID(s) (OTTP)
64                                                 HGNC ID(s)
65                                                HGNC symbol
66                                       HGNC transcript name
67                                                  MEROPS ID
68                                       MIM Morbid Accession
69                                     MIM Morbid Description
70                                         MIM Gene Accession
71                                       MIM Gene Description
72                                       miRBase Accession(s)
73                                              miRBase ID(s)
74                                    miRBase transcript name
75                                                     PDB ID
76                       Protein (Genbank) ID [e.g. AAA02487]
77                                                Reactome ID
78                         Reactome gene ID [e.g. REACT_1006]
79                  Reactome transcript ID [e.g. REACT_11045]
80                            RefSeq mRNA [e.g. NM_001195597]
81                  RefSeq mRNA predicted [e.g. XM_001125684]
82                              RefSeq ncRNA [e.g. NR_002834]
83                    RefSeq ncRNA predicted [e.g. XR_108264]
84                      RefSeq Protein ID [e.g. NP_001005353]
85            RefSeq Predicted Protein ID [e.g. XP_001720922]
86                                                    Rfam ID
87                                       Rfam transcript name
88                                              RNACentral ID
89                                                    UCSC ID
90                                                 Unigene ID
91                                                    UniParc
92                                   UniProt/TrEMBL Accession
93                                UniProt/SwissProt Accession
94                                          UniProt Gene Name
95                                              WikiGene Name
96                                                WikiGene ID
97                                       WikiGene Description
98                        Agilent SurePrint G3 GE 8x60k probe
99                     Agilent SurePrint G3 GE 8x60k v2 probe
100                        Agilent WholeGenome 4x44k v1 probe
101                        Agilent WholeGenome 4x44k v2 probe
102                                     Affy HC G110 probeset
103                                    Affy HG FOCUS probeset
104                              Affy HG U133-PLUS-2 probeset
105                                  Affy HG U133A_2 probeset
106                                    Affy HG U133A probeset
107                                    Affy HG U133B probeset
108                                   Affy HG U95AV2 probeset
109                                     Affy HG U95B probeset
110                                     Affy HG U95C probeset
111                                     Affy HG U95D probeset
112                                     Affy HG U95E probeset
113                                     Affy HG U95A probeset
114                                   Affy HuGene FL probeset
115                                     Affy HTA-2_0 probeset
116                              Affy HuEx 1_0 st v2 probeset
117                            Affy HuGene 1_0 st v1 probeset
118                            Affy HuGene 2_0 st v1 probeset
119                                            Affy primeview
120                                    Affy U133 X3P probeset
121                                     Agilent CGH 44b probe
122                                            Codelink probe
123                               Illumina HumanWG 6 v1 probe
124                               Illumina HumanWG 6 v2 probe
125                               Illumina HumanWG 6 v3 probe
126                            Illumina Human HT 12 V3 probe 
127                            Illumina Human HT 12 V4 probe 
128                             Illumina Human Ref 8 V3 probe
129                                    Phalanx OneArray probe
130                              Ensembl Protein Family ID(s)
131                                Ensembl Family Description
132                                                  PIRSF ID
133                                               PIRSF start
134                                                 PIRSF end
135                                            SUPERFAMILY ID
136                                         SUPERFAMILY start
137                                           SUPERFAMILY end
138                                                  SMART ID
139                                               SMART start
140                                                 SMART end
141                                        HAMAP Accession ID
142                                               HAMAP start
143                                                 HAMAP end
144                                                 Pfscan ID
145                                              Pfscan start
146                                                Pfscan end
147                                            ScanProsite ID
148                                         ScanProsite start
149                                           ScanProsite end
150                                                 PRINTS ID
151                                              PRINTS start
152                                                PRINTS end
153                                                   Pfam ID
154                                                Pfam start
155                                                  Pfam end
156                                                TIGRFAM ID
157                                             TIGRFAM start
158                                               TIGRFAM end
159                                                 Gene3D ID
160                                              Gene3D start
161                                                Gene3D end
162                                             HMMPanther ID
163                                          HMMPanther start
164                                            HMMPanther end
165                                               Interpro ID
166                                Interpro Short Description
167                                      Interpro Description
168                                            Interpro start
169                                              Interpro end
170                                      low complexity (SEG)
171                                low complexity (SEG) start
172                                  low complexity (SEG) end
173                              Transmembrane domain (tmhmm)
174                        Transmembrane domain (tmhmm) start
175                          Transmembrane domain (tmhmm) end
176                                            signal peptide
177                                      signal peptide start
178                                        signal peptide end
179                                      coiled coil (ncoils)
180                                coiled coil (ncoils) start
181                                  coiled coil (ncoils) end
182                                           Ensembl Gene ID
183                                     Ensembl Transcript ID
184                                        Ensembl Protein ID
185                                           Chromosome Name
186                                           Gene Start (bp)
187                                             Gene End (bp)
188                                     Transcript Start (bp)
189                                       Transcript End (bp)
190                            Transcription Start Site (TSS)
191                Transcript length (including UTRs and CDS)
192                                                    Strand
193                                      Associated Gene Name
194                                    Associated Gene Source
195                                              5' UTR Start
196                                                5' UTR End
197                                              3' UTR Start
198                                                3' UTR End
199                                                CDS Length
200                                          Transcript count
201                                               Description
202                                                 Gene type
203                                       Exon Chr Start (bp)
204                                         Exon Chr End (bp)
205                                         Constitutive Exon
206                                   Exon Rank in Transcript
207                                                     phase
208                                                 end phase
209                                         cDNA coding start
210                                           cDNA coding end
211                                      Genomic coding start
212                                        Genomic coding end
213                                           Ensembl Exon ID
214                                                 CDS Start
215                                                   CDS End
216                                           Ensembl Gene ID
217                                     Ensembl Transcript ID
218                                        Ensembl Protein ID
219                                           Chromosome Name
220                                           Gene Start (bp)
221                                             Gene End (bp)
222                                                    Strand
223                                                      Band
224                                      Associated Gene Name
225                                    Associated Gene Source
226                                          Transcript count
227                                              % GC content
228                                               Description
229                                    Alpaca Ensembl Gene ID
230                        Canonical Protein or Transcript ID
231                                 Alpaca Ensembl Protein ID
232                                    Alpaca Chromosome Name
233                              Alpaca Chromosome Start (bp)
234                                Alpaca Chromosome End (bp)
235                                             Homology Type
236                                                  Ancestor
237                      Orthology confidence [0 low, 1 high]
238                     % Identity with respect to query gene
239                    % Identity with respect to Alpaca gene
240                              Amazon molly Ensembl Gene ID
241                        Canonical Protein or Transcript ID
242                           Amazon molly Ensembl Protein ID
243                              Amazon molly Chromosome Name
244                        Amazon molly Chromosome Start (bp)
245                          Amazon molly Chromosome End (bp)
246                                             Homology Type
247                                                  Ancestor
248                      Orthology confidence [0 low, 1 high]
249                     % Identity with respect to query gene
250              % Identity with respect to Amazon molly gene
251                              Anole Lizard Ensembl Gene ID
252                        Canonical Protein or Transcript ID
253                           Anole Lizard Ensembl Protein ID
254                              Anole Lizard Chromosome Name
255                        Anole Lizard Chromosome Start (bp)
256                          Anole Lizard Chromosome End (bp)
257                                             Homology Type
258                                                  Ancestor
259                      Orthology confidence [0 low, 1 high]
260                     % Identity with respect to query gene
261              % Identity with respect to Anole Lizard gene
262                                                        dN
263                                                        dS
264                                 Armadillo Ensembl Gene ID
265                        Canonical Protein or Transcript ID
266                              Armadillo Ensembl Protein ID
267                                 Armadillo Chromosome Name
268                           Armadillo Chromosome Start (bp)
269                             Armadillo Chromosome End (bp)
270                                             Homology Type
271                                                  Ancestor
272                      Orthology confidence [0 low, 1 high]
273                     % Identity with respect to query gene
274                 % Identity with respect to Armadillo gene
275                                                        dN
276                                                        dS
277                              Atlantic Cod Ensembl Gene ID
278                        Canonical Protein or Transcript ID
279                           Atlantic Cod Ensembl Protein ID
280                              Atlantic Cod Chromosome Name
281                        Atlantic Cod Chromosome Start (bp)
282                          Atlantic Cod Chromosome End (bp)
283                                             Homology Type
284                                                  Ancestor
285                      Orthology confidence [0 low, 1 high]
286                     % Identity with respect to query gene
287              % Identity with respect to Atlantic Cod gene
288                                  Bushbaby Ensembl Gene ID
289                        Canonical Protein or Transcript ID
290                               Bushbaby Ensembl Protein ID
291                                  Bushbaby Chromosome Name
292                            Bushbaby Chromosome Start (bp)
293                              Bushbaby Chromosome End (bp)
294                                             Homology Type
295                                                  Ancestor
296                      Orthology confidence [0 low, 1 high]
297                     % Identity with respect to query gene
298                  % Identity with respect to Bushbaby gene
299                                                        dN
300                                                        dS
301              Caenorhabditis elegans Chromosome Start (bp)
302                    Caenorhabditis elegans Ensembl Gene ID
303                        Canonical Protein or Transcript ID
304                 Caenorhabditis elegans Ensembl Protein ID
305                    Caenorhabditis elegans Chromosome Name
306                Caenorhabditis elegans Chromosome End (bp)
307                                             Homology Type
308                                                  Ancestor
309                      Orthology confidence [0 low, 1 high]
310                     % Identity with respect to query gene
311    % Identity with respect to Caenorhabditis elegans gene
312                                                        dN
313                                                        dS
314                                       Cat Ensembl Gene ID
315                        Canonical Protein or Transcript ID
316                                    Cat Ensembl Protein ID
317                                       Cat Chromosome Name
318                                 Cat Chromosome Start (bp)
319                                   Cat Chromosome End (bp)
320                                             Homology Type
321                                                  Ancestor
322                      Orthology confidence [0 low, 1 high]
323                     % Identity with respect to query gene
324                       % Identity with respect to Cat gene
325                                                        dN
326                                                        dS
327                                 Cave fish Ensembl Gene ID
328                        Canonical Protein or Transcript ID
329                              Cave fish Ensembl Protein ID
330                                 Cave fish Chromosome Name
331                           Cave fish Chromosome Start (bp)
332                             Cave fish Chromosome End (bp)
333                                             Homology Type
334                                                  Ancestor
335                      Orthology confidence [0 low, 1 high]
336                     % Identity with respect to query gene
337                 % Identity with respect to Cave fish gene
338                                   Chicken Ensembl Gene ID
339                        Canonical Protein or Transcript ID
340                                Chicken Ensembl Protein ID
341                                   Chicken Chromosome Name
342                             Chicken Chromosome Start (bp)
343                               Chicken Chromosome End (bp)
344                                             Homology Type
345                                                  Ancestor
346                      Orthology confidence [0 low, 1 high]
347                     % Identity with respect to query gene
348                   % Identity with respect to Chicken gene
349                                                        dN
350                                                        dS
351                                Chimpanzee Ensembl Gene ID
352                        Canonical Protein or Transcript ID
353                             Chimpanzee Ensembl Protein ID
354                                Chimpanzee Chromosome Name
355                          Chimpanzee Chromosome Start (bp)
356                            Chimpanzee Chromosome End (bp)
357                                             Homology Type
358                                                  Ancestor
359                      Orthology confidence [0 low, 1 high]
360                     % Identity with respect to query gene
361                % Identity with respect to Chimpanzee gene
362                                                        dN
363                                                        dS
364                  Chinese softshell turtle Ensembl Gene ID
365                        Canonical Protein or Transcript ID
366               Chinese softshell turtle Ensembl Protein ID
367                  Chinese softshell turtle Chromosome Name
368            Chinese softshell turtle Chromosome Start (bp)
369              Chinese softshell turtle Chromosome End (bp)
370                                             Homology Type
371                                                  Ancestor
372                      Orthology confidence [0 low, 1 high]
373                     % Identity with respect to query gene
374  % Identity with respect to Chinese softshell turtle gene
375                                                        dN
376                                                        dS
377                        Ciona intestinalis Ensembl Gene ID
378                        Canonical Protein or Transcript ID
379                     Ciona intestinalis Ensembl Protein ID
380                        Ciona intestinalis Chromosome Name
381                  Ciona intestinalis Chromosome Start (bp)
382                    Ciona intestinalis Chromosome End (bp)
383                                             Homology Type
384                                                  Ancestor
385                      Orthology confidence [0 low, 1 high]
386                     % Identity with respect to query gene
387        % Identity with respect to Ciona intestinalis gene
388                                                        dN
389                                                        dS
390                            Ciona savignyi Ensembl Gene ID
391                        Canonical Protein or Transcript ID
392                         Ciona savignyi Ensembl Protein ID
393                            Ciona savignyi Chromosome Name
394                      Ciona savignyi Chromosome Start (bp)
395                        Ciona savignyi Chromosome End (bp)
396                                             Homology Type
397                                                  Ancestor
398                      Orthology confidence [0 low, 1 high]
399                     % Identity with respect to query gene
400            % Identity with respect to Ciona savignyi gene
401                                                        dN
402                                                        dS
403                                Coelacanth Ensembl Gene ID
404                        Canonical Protein or Transcript ID
405                             Coelacanth Ensembl Protein ID
406                                Coelacanth Chromosome Name
407                          Coelacanth Chromosome Start (bp)
408                            Coelacanth Chromosome End (bp)
409                                             Homology Type
410                                                  Ancestor
411                      Orthology confidence [0 low, 1 high]
412                     % Identity with respect to query gene
413                % Identity with respect to Coelacanth gene
414                              Common Shrew Ensembl Gene ID
415                        Canonical Protein or Transcript ID
416                           Common Shrew Ensembl Protein ID
417                              Common Shrew Chromosome Name
418                        Common Shrew Chromosome Start (bp)
419                          Common Shrew Chromosome End (bp)
420                                             Homology Type
421                                                  Ancestor
422                      Orthology confidence [0 low, 1 high]
423                     % Identity with respect to query gene
424              % Identity with respect to Common Shrew gene
425                                       Cow Ensembl Gene ID
426                        Canonical Protein or Transcript ID
427                                    Cow Ensembl Protein ID
428                                       Cow Chromosome Name
429                                 Cow Chromosome Start (bp)
430                                   Cow Chromosome End (bp)
431                                             Homology Type
432                                                  Ancestor
433                      Orthology confidence [0 low, 1 high]
434                     % Identity with respect to query gene
435                       % Identity with respect to Cow gene
436                                                        dN
437                                                        dS
438                                       Dog Ensembl Gene ID
439                        Canonical Protein or Transcript ID
440                                    Dog Ensembl Protein ID
441                                       Dog Chromosome Name
442                                 Dog Chromosome Start (bp)
443                                   Dog Chromosome End (bp)
444                                             Homology Type
445                                                  Ancestor
446                      Orthology confidence [0 low, 1 high]
447                     % Identity with respect to query gene
448                       % Identity with respect to Dog gene
449                                                        dN
450                                                        dS
451                                   Dolphin Ensembl Gene ID
452                        Canonical Protein or Transcript ID
453                                Dolphin Ensembl Protein ID
454                                   Dolphin Chromosome Name
455                             Dolphin Chromosome Start (bp)
456                               Dolphin Chromosome End (bp)
457                                             Homology Type
458                                                  Ancestor
459                      Orthology confidence [0 low, 1 high]
460                     % Identity with respect to query gene
461                   % Identity with respect to Dolphin gene
462                                Drosophila Ensembl Gene ID
463                        Canonical Protein or Transcript ID
464                             Drosophila Ensembl Protein ID
465                                Drosophila Chromosome Name
466                          Drosophila Chromosome Start (bp)
467                            Drosophila Chromosome End (bp)
468                                             Homology Type
469                                                  Ancestor
470                      Orthology confidence [0 low, 1 high]
471                     % Identity with respect to query gene
472                % Identity with respect to Drosophila gene
473                                                        dN
474                                                        dS
475                                      Duck Ensembl Gene ID
476                        Canonical Protein or Transcript ID
477                                   Duck Ensembl Protein ID
478                                      Duck Chromosome Name
479                                Duck Chromosome Start (bp)
480                                  Duck Chromosome End (bp)
481                                             Homology Type
482                                                  Ancestor
483                      Orthology confidence [0 low, 1 high]
484                     % Identity with respect to query gene
485                      % Identity with respect to Duck gene
486                                                        dN
487                                                        dS
488                                  Elephant Ensembl Gene ID
489                        Canonical Protein or Transcript ID
490                               Elephant Ensembl Protein ID
491                                  Elephant Chromosome Name
492                            Elephant Chromosome Start (bp)
493                              Elephant Chromosome End (bp)
494                                             Homology Type
495                                                  Ancestor
496                      Orthology confidence [0 low, 1 high]
497                     % Identity with respect to query gene
498                  % Identity with respect to Elephant gene
499                                                        dN
500                                                        dS
501                                    Ferret Ensembl Gene ID
502                        Canonical Protein or Transcript ID
503                                 Ferret Ensembl Protein ID
504                                    Ferret Chromosome Name
505                              Ferret Chromosome Start (bp)
506                                Ferret Chromosome End (bp)
507                                             Homology Type
508                                                  Ancestor
509                      Orthology confidence [0 low, 1 high]
510                     % Identity with respect to query gene
511                    % Identity with respect to Ferret gene
512                                                        dN
513                                                        dS
514                                Flycatcher Ensembl Gene ID
515                        Canonical Protein or Transcript ID
516                             Flycatcher Ensembl Protein ID
517                                Flycatcher Chromosome Name
518                          Flycatcher Chromosome Start (bp)
519                            Flycatcher Chromosome End (bp)
520                                             Homology Type
521                                                  Ancestor
522                      Orthology confidence [0 low, 1 high]
523                     % Identity with respect to query gene
524                % Identity with respect to Flycatcher gene
525                                                        dN
526                                                        dS
527                                      Fugu Ensembl Gene ID
528                        Canonical Protein or Transcript ID
529                                   Fugu Ensembl Protein ID
530                                      Fugu Chromosome Name
531                                Fugu Chromosome Start (bp)
532                                  Fugu Chromosome End (bp)
533                                             Homology Type
534                                                  Ancestor
535                      Orthology confidence [0 low, 1 high]
536                     % Identity with respect to query gene
537                      % Identity with respect to Fugu gene
538                                                        dN
539                                                        dS
540                                    Gibbon Ensembl Gene ID
541                        Canonical Protein or Transcript ID
542                                 Gibbon Ensembl Protein ID
543                                    Gibbon Chromosome Name
544                              Gibbon Chromosome Start (bp)
545                                Gibbon Chromosome End (bp)
546                                             Homology Type
547                                                  Ancestor
548                      Orthology confidence [0 low, 1 high]
549                     % Identity with respect to query gene
550                    % Identity with respect to Gibbon gene
551                                                        dN
552                                                        dS
553                                   Gorilla Ensembl Gene ID
554                        Canonical Protein or Transcript ID
555                                Gorilla Ensembl Protein ID
556                                   Gorilla Chromosome Name
557                             Gorilla Chromosome Start (bp)
558                               Gorilla Chromosome End (bp)
559                                             Homology Type
560                                                  Ancestor
561                      Orthology confidence [0 low, 1 high]
562                     % Identity with respect to query gene
563                   % Identity with respect to Gorilla gene
564                                                        dN
565                                                        dS
566                                Guinea Pig Ensembl Gene ID
567                        Canonical Protein or Transcript ID
568                             Guinea Pig Ensembl Protein ID
569                                Guinea Pig Chromosome Name
570                          Guinea Pig Chromosome Start (bp)
571                            Guinea Pig Chromosome End (bp)
572                                             Homology Type
573                                                  Ancestor
574                      Orthology confidence [0 low, 1 high]
575                     % Identity with respect to query gene
576                % Identity with respect to Guinea Pig gene
577                                                        dN
578                                                        dS
579                                  Hedgehog Ensembl Gene ID
580                        Canonical Protein or Transcript ID
581                               Hedgehog Ensembl Protein ID
582                                  Hedgehog Chromosome Name
583                            Hedgehog Chromosome Start (bp)
584                              Hedgehog Chromosome End (bp)
585                                             Homology Type
586                                                  Ancestor
587                      Orthology confidence [0 low, 1 high]
588                     % Identity with respect to query gene
589                  % Identity with respect to Hedgehog gene
590                                     Horse Ensembl Gene ID
591                        Canonical Protein or Transcript ID
592                                  Horse Ensembl Protein ID
593                                     Horse Chromosome Name
594                               Horse Chromosome Start (bp)
595                                 Horse Chromosome End (bp)
596                                             Homology Type
597                                                  Ancestor
598                      Orthology confidence [0 low, 1 high]
599                     % Identity with respect to query gene
600                     % Identity with respect to Horse gene
601                                                        dN
602                                                        dS
603                              Kangaroo Rat Ensembl Gene ID
604                        Canonical Protein or Transcript ID
605                           Kangaroo Rat Ensembl Protein ID
606                              Kangaroo Rat Chromosome Name
607                        Kangaroo Rat Chromosome Start (bp)
608                          Kangaroo Rat Chromosome End (bp)
609                                             Homology Type
610                                                  Ancestor
611                      Orthology confidence [0 low, 1 high]
612                     % Identity with respect to query gene
613              % Identity with respect to Kangaroo Rat gene
614                                   Lamprey Ensembl Gene ID
615                        Canonical Protein or Transcript ID
616                                Lamprey Ensembl Protein ID
617                                   Lamprey Chromosome Name
618                             Lamprey Chromosome Start (bp)
619                               Lamprey Chromosome End (bp)
620                                             Homology Type
621                                                  Ancestor
622                      Orthology confidence [0 low, 1 high]
623                     % Identity with respect to query gene
624                   % Identity with respect to Lamprey gene
625                    Lesser hedgehog tenrec Ensembl Gene ID
626                        Canonical Protein or Transcript ID
627                 Lesser hedgehog tenrec Ensembl Protein ID
628                    Lesser hedgehog tenrec Chromosome Name
629              Lesser hedgehog tenrec Chromosome Start (bp)
630                Lesser hedgehog tenrec Chromosome End (bp)
631                                             Homology Type
632                                                  Ancestor
633                      Orthology confidence [0 low, 1 high]
634                     % Identity with respect to query gene
635    % Identity with respect to Lesser hedgehog tenrec gene
636                                   Macaque Ensembl Gene ID
637                        Canonical Protein or Transcript ID
638                                Macaque Ensembl Protein ID
639                                   Macaque Chromosome Name
640                             Macaque Chromosome Start (bp)
641                               Macaque Chromosome End (bp)
642                                             Homology Type
643                                                  Ancestor
644                      Orthology confidence [0 low, 1 high]
645                     % Identity with respect to query gene
646                   % Identity with respect to Macaque gene
647                                                        dN
648                                                        dS
649                                  Marmoset Ensembl Gene ID
650                        Canonical Protein or Transcript ID
651                               Marmoset Ensembl Protein ID
652                                  Marmoset Chromosome Name
653                            Marmoset Chromosome Start (bp)
654                              Marmoset Chromosome End (bp)
655                                             Homology Type
656                                                  Ancestor
657                      Orthology confidence [0 low, 1 high]
658                     % Identity with respect to query gene
659                  % Identity with respect to Marmoset gene
660                                                        dN
661                                                        dS
662                                    Medaka Ensembl Gene ID
663                        Canonical Protein or Transcript ID
664                                 Medaka Ensembl Protein ID
665                                    Medaka Chromosome Name
666                              Medaka Chromosome Start (bp)
667                                Medaka Chromosome End (bp)
668                                             Homology Type
669                                                  Ancestor
670                      Orthology confidence [0 low, 1 high]
671                     % Identity with respect to query gene
672                    % Identity with respect to Medaka gene
673                                                        dN
674                                                        dS
675                                   Megabat Ensembl Gene ID
676                        Canonical Protein or Transcript ID
677                                Megabat Ensembl Protein ID
678                                   Megabat Chromosome Name
679                             Megabat Chromosome Start (bp)
680                               Megabat Chromosome End (bp)
681                                             Homology Type
682                                                  Ancestor
683                      Orthology confidence [0 low, 1 high]
684                     % Identity with respect to query gene
685                   % Identity with respect to Megabat gene
686                                  Microbat Ensembl Gene ID
687                        Canonical Protein or Transcript ID
688                               Microbat Ensembl Protein ID
689                                  Microbat Chromosome Name
690                            Microbat Chromosome Start (bp)
691                              Microbat Chromosome End (bp)
692                                             Homology Type
693                                                  Ancestor
694                      Orthology confidence [0 low, 1 high]
695                     % Identity with respect to query gene
696                  % Identity with respect to Microbat gene
697                                                        dN
698                                                        dS
699                                     Mouse Ensembl Gene ID
700                        Canonical Protein or Transcript ID
701                                  Mouse Ensembl Protein ID
702                                     Mouse Chromosome Name
703                               Mouse Chromosome Start (bp)
704                                 Mouse Chromosome End (bp)
705                                             Homology Type
706                                                  Ancestor
707                      Orthology confidence [0 low, 1 high]
708                     % Identity with respect to query gene
709                     % Identity with respect to Mouse gene
710                                                        dN
711                                                        dS
712                               Mouse Lemur Ensembl Gene ID
713                        Canonical Protein or Transcript ID
714                            Mouse Lemur Ensembl Protein ID
715                               Mouse Lemur Chromosome Name
716                         Mouse Lemur Chromosome Start (bp)
717                           Mouse Lemur Chromosome End (bp)
718                                             Homology Type
719                                                  Ancestor
720                      Orthology confidence [0 low, 1 high]
721                     % Identity with respect to query gene
722               % Identity with respect to Mouse Lemur gene
723                              Nile tilapia Ensembl Gene ID
724                        Canonical Protein or Transcript ID
725                           Nile tilapia Ensembl Protein ID
726                              Nile tilapia Chromosome Name
727                        Nile tilapia Chromosome Start (bp)
728                          Nile tilapia Chromosome End (bp)
729                                             Homology Type
730                                                  Ancestor
731                      Orthology confidence [0 low, 1 high]
732                     % Identity with respect to query gene
733              % Identity with respect to Nile tilapia gene
734                              Olive baboon Ensembl Gene ID
735                        Canonical Protein or Transcript ID
736                           Olive baboon Ensembl Protein ID
737                              Olive baboon Chromosome Name
738                        Olive baboon Chromosome Start (bp)
739                          Olive baboon Chromosome End (bp)
740                                             Homology Type
741                                                  Ancestor
742                      Orthology confidence [0 low, 1 high]
743                     % Identity with respect to query gene
744              % Identity with respect to Olive baboon gene
745                                                        dN
746                                                        dS
747                                   Opossum Ensembl Gene ID
748                        Canonical Protein or Transcript ID
749                                Opossum Ensembl Protein ID
750                                   Opossum Chromosome Name
751                             Opossum Chromosome Start (bp)
752                               Opossum Chromosome End (bp)
753                                             Homology Type
754                                                  Ancestor
755                      Orthology confidence [0 low, 1 high]
756                     % Identity with respect to query gene
757                   % Identity with respect to Opossum gene
758                                                        dN
759                                                        dS
760                                 Orangutan Ensembl Gene ID
761                        Canonical Protein or Transcript ID
762                              Orangutan Ensembl Protein ID
763                                 Orangutan Chromosome Name
764                           Orangutan Chromosome Start (bp)
765                             Orangutan Chromosome End (bp)
766                                             Homology Type
767                                                  Ancestor
768                      Orthology confidence [0 low, 1 high]
769                     % Identity with respect to query gene
770                 % Identity with respect to Orangutan gene
771                                                        dN
772                                                        dS
773                                     Panda Ensembl Gene ID
774                        Canonical Protein or Transcript ID
775                                  Panda Ensembl Protein ID
776                                     Panda Chromosome Name
777                               Panda Chromosome Start (bp)
778                                 Panda Chromosome End (bp)
779                                             Homology Type
780                                                  Ancestor
781                      Orthology confidence [0 low, 1 high]
782                     % Identity with respect to query gene
783                     % Identity with respect to Panda gene
784                                                        dN
785                                                        dS
786                                       Pig Ensembl Gene ID
787                        Canonical Protein or Transcript ID
788                                    Pig Ensembl Protein ID
789                                       Pig Chromosome Name
790                                 Pig Chromosome Start (bp)
791                                   Pig Chromosome End (bp)
792                                             Homology Type
793                                                  Ancestor
794                      Orthology confidence [0 low, 1 high]
795                     % Identity with respect to query gene
796                       % Identity with respect to Pig gene
797                                                        dN
798                                                        dS
799                                      Pika Ensembl Gene ID
800                        Canonical Protein or Transcript ID
801                                   Pika Ensembl Protein ID
802                                      Pika Chromosome Name
803                                Pika Chromosome Start (bp)
804                                  Pika Chromosome End (bp)
805                                             Homology Type
806                                                  Ancestor
807                      Orthology confidence [0 low, 1 high]
808                     % Identity with respect to query gene
809                      % Identity with respect to Pika gene
810                                 Platyfish Ensembl Gene ID
811                        Canonical Protein or Transcript ID
812                              Platyfish Ensembl Protein ID
813                                 Platyfish Chromosome Name
814                           Platyfish Chromosome Start (bp)
815                             Platyfish Chromosome End (bp)
816                                             Homology Type
817                                                  Ancestor
818                      Orthology confidence [0 low, 1 high]
819                     % Identity with respect to query gene
820                 % Identity with respect to Platyfish gene
821                                  Platypus Ensembl Gene ID
822                        Canonical Protein or Transcript ID
823                               Platypus Ensembl Protein ID
824                                  Platypus Chromosome Name
825                            Platypus Chromosome Start (bp)
826                              Platypus Chromosome End (bp)
827                                             Homology Type
828                                                  Ancestor
829                      Orthology confidence [0 low, 1 high]
830                     % Identity with respect to query gene
831                  % Identity with respect to Platypus gene
832                                                        dN
833                                                        dS
834                                    Rabbit Ensembl Gene ID
835                        Canonical Protein or Transcript ID
836                                 Rabbit Ensembl Protein ID
837                                    Rabbit Chromosome Name
838                              Rabbit Chromosome Start (bp)
839                                Rabbit Chromosome End (bp)
840                                             Homology Type
841                                                  Ancestor
842                      Orthology confidence [0 low, 1 high]
843                     % Identity with respect to query gene
844                    % Identity with respect to Rabbit gene
845                                                        dN
846                                                        dS
847                                       Rat Ensembl Gene ID
848                        Canonical Protein or Transcript ID
849                                    Rat Ensembl Protein ID
850                                       Rat Chromosome Name
851                                 Rat Chromosome Start (bp)
852                                   Rat Chromosome End (bp)
853                                             Homology Type
854                                                  Ancestor
855                      Orthology confidence [0 low, 1 high]
856                     % Identity with respect to query gene
857                       % Identity with respect to Rat gene
858                                                        dN
859                                                        dS
860                                Rock Hyrax Ensembl Gene ID
861                        Canonical Protein or Transcript ID
862                             Rock Hyrax Ensembl Protein ID
863                                Rock Hyrax Chromosome Name
864                          Rock Hyrax Chromosome Start (bp)
865                            Rock Hyrax Chromosome End (bp)
866                                             Homology Type
867                                                  Ancestor
868                      Orthology confidence [0 low, 1 high]
869                     % Identity with respect to query gene
870                % Identity with respect to Rock Hyrax gene
871                                     Sheep Ensembl Gene ID
872                        Canonical Protein or Transcript ID
873                                  Sheep Ensembl Protein ID
874                                     Sheep Chromosome Name
875                               Sheep Chromosome Start (bp)
876                                 Sheep Chromosome End (bp)
877                                             Homology Type
878                                                  Ancestor
879                      Orthology confidence [0 low, 1 high]
880                     % Identity with respect to query gene
881                     % Identity with respect to Sheep gene
882                                                        dN
883                                                        dS
884                                     Sloth Ensembl Gene ID
885                        Canonical Protein or Transcript ID
886                                  Sloth Ensembl Protein ID
887                                     Sloth Chromosome Name
888                               Sloth Chromosome Start (bp)
889                                 Sloth Chromosome End (bp)
890                                             Homology Type
891                                                  Ancestor
892                      Orthology confidence [0 low, 1 high]
893                     % Identity with respect to query gene
894                     % Identity with respect to Sloth gene
895                               Spotted gar Ensembl Gene ID
896                        Canonical Protein or Transcript ID
897                            Spotted gar Ensembl Protein ID
898                               Spotted gar Chromosome Name
899                         Spotted gar Chromosome Start (bp)
900                           Spotted gar Chromosome End (bp)
901                                             Homology Type
902                                                  Ancestor
903                      Orthology confidence [0 low, 1 high]
904                     % Identity with respect to query gene
905               % Identity with respect to Spotted gar gene
906                                  Squirrel Ensembl Gene ID
907                        Canonical Protein or Transcript ID
908                               Squirrel Ensembl Protein ID
909                                  Squirrel Chromosome Name
910                            Squirrel Chromosome Start (bp)
911                              Squirrel Chromosome End (bp)
912                                             Homology Type
913                                                  Ancestor
914                      Orthology confidence [0 low, 1 high]
915                     % Identity with respect to query gene
916                  % Identity with respect to Squirrel gene
917                                                        dN
918                                                        dS
919                               Stickleback Ensembl Gene ID
920                        Canonical Protein or Transcript ID
921                            Stickleback Ensembl Protein ID
922                               Stickleback Chromosome Name
923                         Stickleback Chromosome Start (bp)
924                           Stickleback Chromosome End (bp)
925                                             Homology Type
926                                                  Ancestor
927                      Orthology confidence [0 low, 1 high]
928                     % Identity with respect to query gene
929               % Identity with respect to Stickleback gene
930                                                        dN
931                                                        dS
932                                   Tarsier Ensembl Gene ID
933                        Canonical Protein or Transcript ID
934                                Tarsier Ensembl Protein ID
935                                   Tarsier Chromosome Name
936                             Tarsier Chromosome Start (bp)
937                               Tarsier Chromosome End (bp)
938                                             Homology Type
939                                                  Ancestor
940                      Orthology confidence [0 low, 1 high]
941                     % Identity with respect to query gene
942                   % Identity with respect to Tarsier gene
943                           Tasmanian Devil Ensembl Gene ID
944                        Canonical Protein or Transcript ID
945                        Tasmanian Devil Ensembl Protein ID
946                           Tasmanian Devil Chromosome Name
947                     Tasmanian Devil Chromosome Start (bp)
948                       Tasmanian Devil Chromosome End (bp)
949                                             Homology Type
950                                                  Ancestor
951                      Orthology confidence [0 low, 1 high]
952                     % Identity with respect to query gene
953           % Identity with respect to Tasmanian Devil gene
954                                                        dN
955                                                        dS
956                                 Tetraodon Ensembl Gene ID
957                        Canonical Protein or Transcript ID
958                              Tetraodon Ensembl Protein ID
959                                 Tetraodon Chromosome Name
960                           Tetraodon Chromosome Start (bp)
961                             Tetraodon Chromosome End (bp)
962                                             Homology Type
963                                                  Ancestor
964                      Orthology confidence [0 low, 1 high]
965                     % Identity with respect to query gene
966                 % Identity with respect to Tetraodon gene
967                                                        dN
968                                                        dS
969                                Tree Shrew Ensembl Gene ID
970                        Canonical Protein or Transcript ID
971                             Tree Shrew Ensembl Protein ID
972                                Tree Shrew Chromosome Name
973                          Tree Shrew Chromosome Start (bp)
974                            Tree Shrew Chromosome End (bp)
975                                             Homology Type
976                                                  Ancestor
977                      Orthology confidence [0 low, 1 high]
978                     % Identity with respect to query gene
979                % Identity with respect to Tree Shrew gene
980                                    Turkey Ensembl Gene ID
981                        Canonical Protein or Transcript ID
982                                 Turkey Ensembl Protein ID
983                                    Turkey Chromosome Name
984                              Turkey Chromosome Start (bp)
985                                Turkey Chromosome End (bp)
986                                             Homology Type
987                                                  Ancestor
988                      Orthology confidence [0 low, 1 high]
989                     % Identity with respect to query gene
990                    % Identity with respect to Turkey gene
991                                                        dN
992                                                        dS
993                                Vervet-AGM Ensembl Gene ID
994                        Canonical Protein or Transcript ID
995                             Vervet-AGM Ensembl Protein ID
996                                Vervet-AGM Chromosome Name
997                          Vervet-AGM Chromosome Start (bp)
998                            Vervet-AGM Chromosome End (bp)
999                                             Homology Type
1000                                                 Ancestor
1001                     Orthology confidence [0 low, 1 high]
1002                    % Identity with respect to query gene
1003               % Identity with respect to Vervet-AGM gene
1004                                                       dN
1005                                                       dS
1006                                  Wallaby Ensembl Gene ID
1007                       Canonical Protein or Transcript ID
1008                               Wallaby Ensembl Protein ID
1009                                  Wallaby Chromosome Name
1010                            Wallaby Chromosome Start (bp)
1011                              Wallaby Chromosome End (bp)
1012                                            Homology Type
1013                                                 Ancestor
1014                     Orthology confidence [0 low, 1 high]
1015                    % Identity with respect to query gene
1016                  % Identity with respect to Wallaby gene
1017                                  Xenopus Ensembl Gene ID
1018                       Canonical Protein or Transcript ID
1019                               Xenopus Ensembl Protein ID
1020                                  Xenopus Chromosome Name
1021                            Xenopus Chromosome Start (bp)
1022                              Xenopus Chromosome End (bp)
1023                                            Homology Type
1024                                                 Ancestor
1025                     Orthology confidence [0 low, 1 high]
1026                    % Identity with respect to query gene
1027                  % Identity with respect to Xenopus gene
1028                                                       dN
1029                                                       dS
1030                                    Yeast Ensembl Gene ID
1031                       Canonical Protein or Transcript ID
1032                                 Yeast Ensembl Protein ID
1033                                    Yeast Chromosome Name
1034                              Yeast Chromosome Start (bp)
1035                                Yeast Chromosome End (bp)
1036                                            Homology Type
1037                                                 Ancestor
1038                     Orthology confidence [0 low, 1 high]
1039                    % Identity with respect to query gene
1040                    % Identity with respect to Yeast gene
1041                                                       dN
1042                                                       dS
1043                              Zebra Finch Ensembl Gene ID
1044                       Canonical Protein or Transcript ID
1045                           Zebra Finch Ensembl Protein ID
1046                              Zebra Finch Chromosome Name
1047                        Zebra Finch Chromosome Start (bp)
1048                          Zebra Finch Chromosome End (bp)
1049                                            Homology Type
1050                                                 Ancestor
1051                     Orthology confidence [0 low, 1 high]
1052                    % Identity with respect to query gene
1053              % Identity with respect to Zebra Finch gene
1054                                                       dN
1055                                                       dS
1056                                Zebrafish Ensembl Gene ID
1057                       Canonical Protein or Transcript ID
1058                             Zebrafish Ensembl Protein ID
1059                                Zebrafish Chromosome Name
1060                          Zebrafish Chromosome Start (bp)
1061                            Zebrafish Chromosome End (bp)
1062                                            Homology Type
1063                                                 Ancestor
1064                     Orthology confidence [0 low, 1 high]
1065                    % Identity with respect to query gene
1066                % Identity with respect to Zebrafish gene
1067                                                       dN
1068                                                       dS
1069                            Human Paralog Ensembl Gene ID
1070                       Canonical Protein or Transcript ID
1071                         Human Paralog Ensembl Protein ID
1072                            Human Paralog Chromosome Name
1073                             Human Paralog Chr Start (bp)
1074                               Human Paralog Chr End (bp)
1075                                            Homology Type
1076                                                 Ancestor
1077                      Paralogy confidence [0 low, 1 high]
1078                    % Identity with respect to query gene
1079                    % Identity with respect to Human gene
1080                                                       dN
1081                                                       dS
1082                                          Ensembl Gene ID
1083                                    Ensembl Transcript ID
1084                                       Ensembl Protein ID
1085                                          Chromosome Name
1086                                          Gene Start (bp)
1087                                            Gene End (bp)
1088                                                   Strand
1089                                                     Band
1090                                     Associated Gene Name
1091                                   Associated Gene Source
1092                                         Transcript count
1093                                             % GC content
1094                                              Description
1095                                             Variant Name
1096                                           Variant Source
1097                               Variant source description
1098                                          Variant Alleles
1099                              Variant supporting evidence
1100                                                Mapweight
1101                                             Minor allele
1102                                   Minor allele frequency
1103                                       Minor allele count
1104                                    Clinical significance
1105                                 Transcript location (bp)
1106                                Variant Chromosome Strand
1107                                    Protein location (aa)
1108                           Chromosome position start (bp)
1109                             Chromosome position end (bp)
1110                                      PolyPhen prediction
1111                                           PolyPhen score
1112                                          SIFT prediction
1113                                               SIFT score
1114                                   Distance to transcript
1115                                                CDS Start
1116                                                  CDS End
1117                                           Protein Allele
1118                                      Variant Consequence
1119                              Consequence specific allele
1120                                          Ensembl Gene ID
1121                                    Ensembl Transcript ID
1122                                       Ensembl Protein ID
1123                                          Chromosome Name
1124                                          Gene Start (bp)
1125                                            Gene End (bp)
1126                                                   Strand
1127                                                     Band
1128                                     Associated Gene Name
1129                                   Associated Gene Source
1130                                         Transcript count
1131                                             % GC content
1132                                              Description
1133                                             Variant Name
1134                                           Variant Source
1135                               Variant source description
1136                                          Variant Alleles
1137                              Variant supporting evidence
1138                                                Mapweight
1139                                             Minor allele
1140                                   Minor allele frequency
1141                                       Minor allele count
1142                                    Clinical significance
1143                                 Transcript location (bp)
1144                                Variant Chromosome Strand
1145                                    Protein location (aa)
1146                           Chromosome position start (bp)
1147                             Chromosome position end (bp)
1148                                      PolyPhen prediction
1149                                           PolyPhen score
1150                                          SIFT prediction
1151                                               SIFT score
1152                                   Distance to transcript
1153                                                CDS Start
1154                                                  CDS End
1155                                           Protein Allele
1156                                      Variant Consequence
1157                              Consequence specific allele
1158                                   Unspliced (Transcript)
1159                                         Unspliced (Gene)
1160                                       Flank (Transcript)
1161                                             Flank (Gene)
1162                         Flank-coding region (Transcript)
1163                               Flank-coding region (Gene)
1164                                                   5' UTR
1165                                                   3' UTR
1166                                           Exon sequences
1167                                           cDNA sequences
1168                                          Coding sequence
1169                                                  Protein
1170                                           upstream_flank
1171                                         downstream_flank
1172                                          Ensembl Gene ID
1173                                              Description
1174                                     Associated Gene Name
1175                                   Associated Gene Source
1176                                          Chromosome Name
1177                                          Gene Start (bp)
1178                                            Gene End (bp)
1179                                                Gene type
1180                             Ensembl Protein Family ID(s)
1181                                  CDS start (within cDNA)
1182                                    CDS end (within cDNA)
1183                                             5' UTR Start
1184                                               5' UTR End
1185                                             3' UTR Start
1186                                               3' UTR End
1187                                    Ensembl Transcript ID
1188                                       Ensembl Protein ID
1189                                          Transcript type
1190                                                   Strand
1191                                    Transcript Start (bp)
1192                                      Transcript End (bp)
1193                           Transcription Start Site (TSS)
1194               Transcript length (including UTRs and CDS)
1195                                               CDS Length
1196                                                CDS Start
1197                                                  CDS End
1198                                          Ensembl Exon ID
1199                                      Exon Chr Start (bp)
1200                                        Exon Chr End (bp)
1201                                                   Strand
1202                                  Exon Rank in Transcript
1203                                                    phase
1204                                        cDNA coding start
1205                                          cDNA coding end
1206                                     Genomic coding start
1207                                       Genomic coding end
1208                                        Constitutive Exon
In [33]:
%%R
attri=c("ensembl_gene_id","go_id","name_1006")
res=getBM(attributes=attri,mart = ensembl)
In [34]:
print datetime.now()
2016-10-29 12:47:17.753866
In [35]:
%Rpull res
def CombineAnn(df):
     return pd.Series(dict(ensembl_gene_id = ', '.join([ s for s in list(set(df['ensembl_gene_id']))  if len(s) > 1 ] ) , 
                        go_id = ', '.join([ s for s in list(set(df['go_id'])) if len(s) > 1 ]),
                        name_1006 = ', '.join([ s for s in list(set(df['name_1006'])) if len(s) > 1 ] )   ) ) 

cols=res.columns.tolist()

res_Genes=res.groupby("ensembl_gene_id").apply(CombineAnn)
res_Genes.reset_index(inplace=True,drop=True)

res_Genes.to_csv(outFolder+"GenesAnnotations.tsv",sep="\t",index=None)
In [36]:
res_Genes=pd.read_table(outFolder+"GenesAnnotations.tsv")
In [37]:
print datetime.now()
def AnnotateDF(df,refcol,annTable=res_Genes,parsedGTF=parsedGTF,dropGeneID=False):
    """
    Annotates a dataframe.
    
    :param df: a Pandas dataframe to be annotated
    :param refcol: the header of the column containing the ids to be annotated
    :param annTable: a table with annotations
    :param parsedGTF: a parsed GTF as outputed by parseGTF()
    :param dropGeneID: if you are merging on transcript_id and the df already has gene_id in the header set this to True
    
    :returns: a Pandas dataframe
    """ 
    
    df=df.copy()
    parsedGTF_=parsedGTF[['gene_id','gene_name','transcript_id','gene_type']].astype(str).drop_duplicates()
    parsedGTF_=parsedGTF_[parsedGTF_[refcol].astype(str)!="nan"]
    def CombineAnn(df):
            return pd.Series(dict(gene_id = ', '.join([ s for s in list(set(df['gene_id']))  if s != "nan" ] ) , 
                        gene_name = ', '.join([ s for s in list(set(df['gene_name'])) if s != "nan" ]),
                        transcript_id = ', '.join([ s for s in list(set(df['transcript_id'])) if s != "nan" ] ) ,
                        gene_type = ', '.join([ s for s in list(set(df['gene_type'])) if s != "nan" ] ) ) ) 
    
    # As defined in http://pandas.pydata.org/pandas-docs/stable/groupby.html
    # By “group by” we are referring to a process involving one or more of the following steps
    # Splitting the data into groups based on some criteria
    # Applying a function to each group independently
    # Combining the results into a data structure
    id_name=parsedGTF_.groupby(refcol).apply(CombineAnn)
    
    if dropGeneID:
        id_name=id_name.drop(["gene_id"],axis=1)
    df=pd.merge(df,id_name, on=refcol, how="left")
    bads=[s for s in ['gene_id','transcript_id'] if s != refcol][0]
    dfBads=df[df[bads].astype(str)=="nan"]
    if len(dfBads) > 0:
        bads_=dfBads[refcol].tolist()
        bads_=[ str(s) for s in bads_ ]
        bads_="\n".join(bads_)
        print refcol
        print "For the following %s no %s could be found:\n%s" %(refcol, bads, bads_)
        sys.stdout.flush()
    df["ensembl_gene_id"]=df["gene_id"].apply(lambda x: x.split(".")[0])
    df=pd.merge(df,annTable,on="ensembl_gene_id", how="left")
    return df

dfGenesAn=AnnotateDF(dfGenesWout,"gene_id")
dfTranscriptsAn=AnnotateDF(dfTranscriptsWout,"transcript_id")
dfTargetsAn=AnnotateDF(dfTargets,"transcript_id",dropGeneID=True)
2016-10-29 12:47:29.040724
gene_id
For the following gene_id no transcript_id could be found:
gSpikein_ERCC-00076
gSpikein_ERCC-00061
gSpikein_ERCC-00126
gSpikein_ERCC-00168
gSpikein_ERCC-00098
gSpikein_ERCC-00112
gSpikein_ERCC-00097
gSpikein_ERCC-00074
gSpikein_ERCC-00077
gSpikein_ERCC-00004
gSpikein_ERCC-00054
gSpikein_ERCC-00083
gSpikein_ERCC-00069
gSpikein_ERCC-00148
gSpikein_ERCC-00162
gSpikein_ERCC-00144
gSpikein_ERCC-00171
gSpikein_ERCC-00095
gSpikein_ERCC-00017
gSpikein_ERCC-00067
gSpikein_ERCC-00145
gSpikein_ERCC-00024
gSpikein_ERCC-00123
gSpikein_ERCC-00044
gSpikein_ERCC-00043
gSpikein_ERCC-00062
gSpikein_ERCC-00138
gSpikein_ERCC-00131
gSpikein_ERCC-00012
gSpikein_ERCC-00073
gSpikein_ERCC-00051
gSpikein_ERCC-00158
gSpikein_ERCC-00157
gSpikein_ERCC-00039
gSpikein_ERCC-00111
gSpikein_ERCC-00108
gSpikein_ERCC-00078
gSpikein_ERCC-00096
gSpikein_ERCC-00075
gSpikein_ERCC-00160
gSpikein_ERCC-00081
gSpikein_ERCC-00002
gSpikein_ERCC-00060
gSpikein_ERCC-00134
gSpikein_ERCC-00104
gSpikein_ERCC-00137
gSpikein_ERCC-00019
gSpikein_ERCC-00120
gSpikein_ERCC-00025
gSpikein_ERCC-00046
gSpikein_ERCC-00143
gSpikein_ERCC-00085
gSpikein_ERCC-00042
gSpikein_phiX174
gSpikein_ERCC-00033
gSpikein_ERCC-00092
gSpikein_ERCC-00156
gSpikein_ERCC-00142

9. Clustering and heatmaps

We start by preparing a dataframe with log10(expression) values. We merge gene_id and gene_name for each gene as they are most informative than each isolated while being non-redundant.

In [38]:
print datetime.now()
forHeatmap=dfGenesAn[dfGenesAn["sig"]=="yes"][["gene_id","gene_name","shRNA","control"]]
forHeatmap["labels"]=forHeatmap["gene_name"].astype(str)+"_"+forHeatmap["gene_id"].astype(str)
for f in ["shRNA","control"]:
    forHeatmap[f]=forHeatmap[f].apply(lambda x: np.log10(x))
forHeatmap_=forHeatmap.copy()
forHeatmap.index=forHeatmap["labels"]
forHeatmap=forHeatmap.drop(["labels","gene_id","gene_name"],axis=1)
2016-10-29 12:51:06.845392
In [39]:
print datetime.now()
2016-10-29 12:51:06.935236
In [40]:
%Rpush forHeatmap

To identify the number of clusters that better describes our results we plot the within groups sum of squares in funtion of number of clusters. This is often more an art than a science but once the within groups sum of squares does not change with the increase in number of clusters this defines the optimal number of clusters to separate our data.

In [41]:
%%R
library('gplots')
library('Gviz')

mat=data.matrix(forHeatmap)
# Determine number of clusters
wss <- (nrow(mat)-1)*sum(apply(mat,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(mat, 
                                     centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: 
Attaching package: ‘gplots’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: The following object is masked from ‘package:stats’:

    lowess


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: grid

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: BiocGenerics

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: parallel

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: 
Attaching package: ‘BiocGenerics’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: The following object is masked from ‘package:stats’:

    xtabs


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
    is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
    pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
    rownames, sapply, setdiff, sort, table, tapply, union, unique,
    unlist, unsplit


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: S4Vectors

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: stats4

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: IRanges

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: 
Attaching package: ‘IRanges’


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: The following object is masked from ‘package:gplots’:

    space


  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: GenomeInfoDb

  warnings.warn(x, RRuntimeWarning)
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/rpy2-2.8.3-py2.7-linux-x86_64.egg/rpy2/rinterface/__init__.py:185: RRuntimeWarning: Loading required package: GenomicRanges

  warnings.warn(x, RRuntimeWarning)
In [42]:
print datetime.now()
2016-10-29 12:51:16.812292
In [43]:
%%R
nclusters=4


# we start by plotting the heatmap
pdf("~/work/results/figures/Figure20.pdf",height = 48*3, width = 48)
heat=heatmap.2(mat,
               scale="none",
               Rowv=TRUE,
               Colv=TRUE,
               srtCol=0,
               offsetCol=8,
               margins=c(10,1), # ("margin.Y", "margin.X")
               trace='none', 
               symkey=FALSE, 
               symbreaks=FALSE, 
               dendrogram='both',
               density.info='none', 
               denscol="black",
               keysize=0,
               cexRow=0.2,
               cexCol=10,
               #( "bottom.margin", "left.margin", "top.margin", "left.margin" )
               key.par=list(mar=c(10.5,0,3,0)),
               # lmat -- added 2 lattice sections (5 and 6) for padding
               lmat=rbind(c(10,7, 4, 6), c(3,1, 2, 8), c(7,9,5,11)), lhei=c(0.25, 5,0.2), lwid=c(4,0.75, 10, 1))
dev.off()

# we extract the row dendrogram and cut the tree at the level 
# indicated by the plot above (nclusters=4)
hc <- as.hclust( heat$rowDendrogram)
groups <- cutree(hc, k=nclusters)
final_cluster=data.frame(groups[hc$order])

# replot the heatmap with RowSideColors=as.character(groups) in order to 
# mark the clusters in a visual fashion
pdf("~/work/results/figures/Figure20.pdf",height = 48*3, width = 48)
heat=heatmap.2(mat,
               scale="none",
               Rowv=TRUE,
               Colv=TRUE,
               srtCol=0,
               offsetCol=8,
               margins=c(10,1), # ("margin.Y", "margin.X")
               trace='none', 
               symkey=FALSE, 
               symbreaks=FALSE, 
               dendrogram='both',
               density.info='none', 
               denscol="black",
               keysize=0,
               cexRow=0.2,
               cexCol=10,
               RowSideColors=as.character(groups),
               #( "bottom.margin", "left.margin", "top.margin", "left.margin" )
               key.par=list(mar=c(10.5,0,3,0)),
               # lmat -- added 2 lattice sections (5 and 6) for padding
               lmat=rbind(c(10,7, 4, 6), c(3,1, 2, 8), c(7,9,5,11)), lhei=c(0.25, 5,0.2), lwid=c(4,0.75, 10, 1))
dev.off()
png 
  2 
In [44]:
# with wand we can open a PDF in out screen.
# This is quite practical here as plotting the R heatmap.2 function 
# above returned an error regarding the margins and our display
# settings
print datetime.now()
from wand.image import Image as WImage
img = WImage(filename=os.path.expanduser("~")+"/work/results/figures/Figure20.pdf")
img 
2016-10-29 12:51:18.778639
Out[44]:
In [45]:
%Rpull final_cluster
print datetime.now()
clusters=final_cluster.reset_index(inplace=False,drop=False)
clusters=clusters.sort_index(ascending=False,inplace=False)
clusters.columns=["labels","cluster"]
clusters=clusters.reset_index(inplace=False,drop=True)

clusters=pd.merge(clusters,forHeatmap_,on=["labels"],how="left")
clusters["gene_name"]=clusters["labels"].apply( lambda x: x.split("_")[0] )
clusters["gene_id"]=clusters["labels"].apply( lambda x: x.split("_")[1] )
clusters["ensembl_gene_id"]=clusters["gene_id"].apply( lambda x: x.split(".")[0])
2016-10-29 12:51:28.088151

We plot the gene expression level of each cluster separately.

In [46]:
print datetime.now()
fig = plt.figure(figsize=(20,20))
for i in list(set(clusters["cluster"].tolist())):
    ax = fig.add_subplot(2,2,i)
    df=clusters[clusters["cluster"]==i]
    n=len(df)
    
    control=df[["control"]].mean()[0]
    shRNA=df[["shRNA"]].mean()[0]

    control_=df[["control"]].sem()[0]
    shRNA_=df[["shRNA"]].sem()[0]

    ax.set_title('cluster %s (n=%s)' %(str(i),str(n)))
    if i in [1,3]:
        ax.set_ylabel("log10(counts)")
    if i not in [3,4]:
        plt.tick_params(
            axis='x',          # changes apply to the x-axis
            which='both',      # both major and minor ticks are affected
            bottom='off',      # ticks along the bottom edge are off
            top='off',         # ticks along the top edge are off
            labelbottom='off') # labels along the bottom edge are off
    else:
        ax.set_xticklabels(["", "control","","shRNA", ""])
        plt.tick_params(
            axis='x',          # changes apply to the x-axis
            which='both',      # both major and minor ticks are affected
            bottom='off',      # ticks along the bottom edge are off
            top='off',         # ticks along the top edge are off
            labelbottom='on') # labels along the bottom edge are off
    ax.set_xlim(0.5,2.5)

    ax.errorbar( [1,2], [control, shRNA], yerr=[control_, shRNA_], fmt='-',color="black")
plt.tight_layout()
plt.savefig(outFigures+"Figure21.png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
plt.savefig(outFigures+"Figure21.svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')
plt.show()
2016-10-29 12:51:28.119542

10. Writing report tables

In [47]:
print datetime.now()
clusters_=clusters[["gene_id","cluster"]]

def writeExcel(df,filename,list_of_filters,refcol,clusters_=clusters_):
    """
    Writes an excel file with the a given dataframe annotated with the cluster identifier \
    and creates sheets for each of list in the list for filters.
    
    :param df: a dataframe
    :param filename: /path/to/file.xlsx to be written
    :param list_of_filters: a list with lists of filters. Each filter containing a list of ids to filters
    :param refcol: header of the column to use when applying filters
    :param clusters_: a dataframe contatining ids and cluster n for each id
    
    :returns: nothing
    """   
    
    Exc=pd.ExcelWriter(filename)
    if refcol == "transcript_id":
        clusters_b=clusters_.copy()
        clusters_b["ensembl_gene_id"]=clusters_b["gene_id"].apply(lambda x: x.split(".")[0])
        df_=pd.merge(df,clusters_b,on=["ensembl_gene_id"],how="left")
    else:
        df_=pd.merge(df,clusters_,on=["gene_id"],how="left")
    df_.to_excel(Exc, "all",index=None)
    for l,label in zip(list_of_filters,["A","B","C","D","E","F","G"][:len(list_of_filters)]):
        tmp=df_[df_[refcol].isin(l)]
        tmp.to_excel(Exc,label,index=None)
    Exc.save()
    
    
writeExcel(dfGenesAn,outFolder+"diff.gene.expression.xlsx",[redGenesOutA,redGenesOutB,redGenesOutC,redGenesOutD,redGenesOutE],"gene_id")       
writeExcel(dfTranscriptsAn,outFolder+"diff.transcrits.expression.xlsx",[redTranscriptsOutA,redTranscriptsOutB,redTranscriptsOutC,redTranscriptsOutD,redTranscriptsOutE],"transcript_id")       
writeExcel(dfTargetsAn,outFolder+"target.genes.xlsx",[redGenesOutA,redGenesOutB,redGenesOutC,redGenesOutD,redGenesOutE],"gene_id")       
2016-10-29 12:51:33.067877
In [48]:
print datetime.now()
2016-10-29 12:52:12.340187

11. Enrichment analysis with DAVID

The Database for Annotation, Visualisation and Integrated Discovery is a powerful tool for the analysis of enriched GO terms in a gene set. Furthermore DAVID also allows the analysis of terms from other databases as for example: KEGG, PFAM, and OMIM. DAVID as a user friendly web frontend as well as a practical API. We here use the DAVID API to programatically analyse all our gene sets.

In [49]:
print datetime.now()

def EnrichmentsFromFile(filename,DAVIDuser,parsedGTF,refCol="ensembl_gene_id",dgeCol=None):
    """
    Performs DAVID enrichment analysis for each sheet in an excel file 
    
    :param filename: /path/to/file.xslx to analyse
    :param DAVIDuser: a registered email address in DAVID https://david.ncifcrf.gov
    :param parsedGTF: a parsed GTF with at least refcCol with the id (eg. 'ensembl_gene_id') and a 'gene_name' as retrieved from parsedGTF()
    :param refCol: the header of the column containing the identifiers
    :param dgeCol: if gene expression values area also reported in parsedGTF the header of the column should be here
    
    :returns: /path/to/generated/report.xlsx
    """
    
    ExcIN=pd.ExcelFile(filename)
    sheets=ExcIN.sheet_names
    for s in sheets:
        gene_ids=ExcIN.parse(s)
        label=s
        print s
        if s=="all":
            if os.path.basename(filename) != "target.genes.xlsx":
                gene_ids=gene_ids[gene_ids["sig"]=="yes"]
                label="sig"
        gene_ids=gene_ids["ensembl_gene_id"].tolist()
        DAVID=age.DAVIDenrich('ENSEMBL_GENE_ID',"GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,KEGG_PATHWAY,PFAM,PROSITE,GENETIC_ASSOCIATION_DB_DISEASE,OMIM_DISEASE",\
                              DAVIDuser,gene_ids)
        if type(DAVID) != type(None):
            print "Curating DAVID for file %s, sheet %s" %(filename,label)
            sys.stdout.flush()
            DAVID["genes_names"]=DAVID["geneIds"].apply(lambda x: age.DAVIDgetGeneAttribute(x,parsedGTF,refCol=refCol))
            if dgeCol:
                DAVID["genes_log2FC"]=DAVID["geneIds"].apply(lambda x: age.DAVIDgetGeneAttribute(x,parsedGTF,refCol=refCol,fieldTOretrieve=dgeCol)) 
            filenameOUT=os.path.dirname(filename)+"/"+os.path.basename(filename).split("xlsx")[0]+"DAVID."+label+".xlsx"
            ExcOUT=pd.ExcelWriter(filenameOUT)
            for c in list(set(DAVID["categoryName"].tolist())):
                DAVID_=DAVID[DAVID["categoryName"]==c]
                DAVID_.to_excel(ExcOUT, c,index=None)
            ExcOUT.save()
    return filenameOUT
    
fixGTF=parsedGTF[["gene_id","gene_name"]]
fixGTF["ensembl_gene_id"]=fixGTF["gene_id"].apply(lambda x: x.split(".")[0])
fixGTF=fixGTF[["ensembl_gene_id","gene_name"]]
for f in ["diff.gene.expression.xlsx","diff.transcrits.expression.xlsx","target.genes.xlsx"]:
    res=EnrichmentsFromFile(outFolder+f,DAVIDuser=DAVIDuser,parsedGTF=fixGTF)
2016-10-29 12:52:12.413594
/opt/conda/envs/ipykernel_py2/lib/python2.7/site-packages/ipykernel/__main__.py:43: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
all
Curating DAVID for file /home/jovyan/work/results/diff.gene.expression.xlsx, sheet sig
A
Curating DAVID for file /home/jovyan/work/results/diff.gene.expression.xlsx, sheet A
B
Curating DAVID for file /home/jovyan/work/results/diff.gene.expression.xlsx, sheet B
C
Curating DAVID for file /home/jovyan/work/results/diff.gene.expression.xlsx, sheet C
D
Curating DAVID for file /home/jovyan/work/results/diff.gene.expression.xlsx, sheet D
E
Curating DAVID for file /home/jovyan/work/results/diff.gene.expression.xlsx, sheet E
all
Curating DAVID for file /home/jovyan/work/results/diff.transcrits.expression.xlsx, sheet sig
A
Curating DAVID for file /home/jovyan/work/results/diff.transcrits.expression.xlsx, sheet A
B
Curating DAVID for file /home/jovyan/work/results/diff.transcrits.expression.xlsx, sheet B
C
Curating DAVID for file /home/jovyan/work/results/diff.transcrits.expression.xlsx, sheet C
D
Curating DAVID for file /home/jovyan/work/results/diff.transcrits.expression.xlsx, sheet D
E
Curating DAVID for file /home/jovyan/work/results/diff.transcrits.expression.xlsx, sheet E
all
Curating DAVID for file /home/jovyan/work/results/target.genes.xlsx, sheet all
A
Curating DAVID for file /home/jovyan/work/results/target.genes.xlsx, sheet A
B
Curating DAVID for file /home/jovyan/work/results/target.genes.xlsx, sheet B
C
Curating DAVID for file /home/jovyan/work/results/target.genes.xlsx, sheet C
D
Curating DAVID for file /home/jovyan/work/results/target.genes.xlsx, sheet D
E
Curating DAVID for file /home/jovyan/work/results/target.genes.xlsx, sheet E
In [50]:
print datetime.now()
for i in list(set(clusters["cluster"].tolist())):
    ENSEMBL_GENE_IDs=clusters[clusters["cluster"]==i]["ensembl_gene_id"].tolist()
    writer = pd.ExcelWriter(outFolder+"cluster_%s_NoBackCorrection.xlsx" %str(i))
    dfGenesAn[dfGenesAn["ensembl_gene_id"].isin(ENSEMBL_GENE_IDs)].to_excel(writer,"genes",index=False)
    DAVID=age.DAVIDenrich('ENSEMBL_GENE_ID',"GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,KEGG_PATHWAY,PFAM,PROSITE,GENETIC_ASSOCIATION_DB_DISEASE,OMIM_DISEASE",\
                   DAVIDuser,ENSEMBL_GENE_IDs)
    if type(DAVID) != type(None):
        DAVID["genes_names"]=DAVID["geneIds"].apply(lambda x: age.DAVIDgetGeneAttribute(x,fixGTF))
        for c in list(set(DAVID["categoryName"].tolist())):
            DAVID_=DAVID[DAVID["categoryName"]==c]
            DAVID_.to_excel(writer,c,index=False)
    writer.save()
2016-10-29 13:52:02.495357
In [51]:
print datetime.now()
ids_bg=clusters["ensembl_gene_id"].tolist()
for i in list(set(clusters["cluster"].tolist())):
    ENSEMBL_GENE_IDs=clusters[clusters["cluster"]==i]["ensembl_gene_id"].tolist()
    writer = pd.ExcelWriter(outFolder+"cluster_%s_BackCorrection.xlsx" %str(i))
    dfGenesAn[dfGenesAn["ensembl_gene_id"].isin(ENSEMBL_GENE_IDs)].to_excel(writer,"genes",index=False)
    DAVID=age.DAVIDenrich('ENSEMBL_GENE_ID',"GOTERM_BP_FAT,GOTERM_CC_FAT,GOTERM_MF_FAT,KEGG_PATHWAY,PFAM,PROSITE,GENETIC_ASSOCIATION_DB_DISEASE,OMIM_DISEASE",\
                   DAVIDuser,ENSEMBL_GENE_IDs,ids_bg=ids_bg)
    if type(DAVID) != type(None):
        DAVID["genes_names"]=DAVID["geneIds"].apply(lambda x: age.DAVIDgetGeneAttribute(x,fixGTF))
        for c in list(set(DAVID["categoryName"].tolist())):
            DAVID_=DAVID[DAVID["categoryName"]==c]
            DAVID_.to_excel(writer,c,index=False)
    writer.save()
2016-10-29 13:58:14.286432

12. Generating MOTIFs with MEME

With the plots above demonstrating a bias in eCLIP towards higher levels of expression of targets for the identification of KHSRP targets we use the peaks identified by eCLIP to generate a KHSRP target motif consensus. We will later run this motif through all human transcripts to identify target transcripts in an unbiased way.

We here use meme-cip, part of the MEME suite. For eCLIP, meme-chip requires 100 bases sequences centred on the peak centre.

In [52]:
print datetime.now()
cols=dfTargetsAn.columns.tolist()[:10]
dfTargetsBED=dfTargetsAn[cols].drop_duplicates()
dfTargetsBED.reset_index(inplace=True, drop=True)

dfTargetsBED["center"]=dfTargetsBED[["chromStart","chromEnd"]].mean(axis=1)
dfTargetsBED["center"]=dfTargetsBED["center"].astype(int)
dfTargetsBED["chromStart"]=dfTargetsBED["center"]-50
dfTargetsBED["chromEnd"]=dfTargetsBED["center"]+50

for i in ["chromStart","chromEnd"]:
    dfTargetsBED[i]=dfTargetsBED[i].astype(int)

dfTargetsBED=dfTargetsBED.drop(["center"],axis=1)
2016-10-29 14:00:18.452621
In [53]:
print datetime.now()
dfTargetsBED=dfTargetsBED.sort_values(by=["chrom","chromStart","chromEnd"])
dfTargetsBED.reset_index(inplace=True, drop=True)
age.writeBED(dfTargetsBED, outFolder+"/targets.bed")
2016-10-29 14:00:18.548756
In [54]:
print datetime.now()
2016-10-29 14:00:18.597838

Having generated the BED file with the regions of interest for definition of the target peak we use bedtools getfasta to extract the respective sequences from the reference genome fasta file in a stranded specific fashion.

As we deal with RNA and not DNA we convert all "T"s to "U"s with sed . Otherwise, meme-chip might start looking for motifs on the reverse strand as well.

In [55]:
%%bash
cd ~/work/results/rsem-results
samtools faidx genome.fa
bedtools getfasta -s -name -fi ~/work/results/rsem-results/genome.fa -bed ~/work/results/targets.bed -fo ~/work/results/targets.fa
In [56]:
print datetime.now()
2016-10-29 14:00:40.624606
In [57]:
%%bash
cd ~/work/results/
sed '/^>/! y/tT/uU/' < targets.fa > targets.rna.fa
meme-chip -oc meme-chip_output -rna -db ~/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme targets.rna.fa
Starting getsize: getsize meme-chip_output/targets.rna.fa 1> $metrics
getsize ran successfully in 0.019052 seconds
Starting fasta-most: fasta-most -min 50 < meme-chip_output/targets.rna.fa 1> $metrics
fasta-most ran successfully in 0.063601 seconds
Starting fasta-center: fasta-center -rna -len 100 < meme-chip_output/targets.rna.fa 1> meme-chip_output/seqs-centered
fasta-center ran successfully in 0.102547 seconds
Starting fasta-shuffle-letters: fasta-shuffle-letters meme-chip_output/seqs-centered meme-chip_output/seqs-shuffled -kmer 2 -tag -dinuc -rna -seed 1
fasta-shuffle-letters ran successfully in 0.062935 seconds
Starting fasta-subsample: fasta-subsample meme-chip_output/seqs-centered 600 -rest meme-chip_output/seqs-discarded -seed 1 1> meme-chip_output/seqs-sampled
fasta-subsample ran successfully in 0.124387 seconds
Starting fasta-get-markov: fasta-get-markov -nostatus -nosummary -rna -m 1 meme-chip_output/targets.rna.fa meme-chip_output/background
fasta-get-markov ran successfully in 0.018278 seconds
Starting meme: meme meme-chip_output/seqs-sampled -oc meme-chip_output/meme_out -mod zoops -nmotifs 3 -minw 6 -maxw 30 -bfile meme-chip_output/background -rna -nostatus
meme ran successfully in 421.621916 seconds
Starting dreme: dreme -v 1 -oc meme-chip_output/dreme_out -png -rna -p meme-chip_output/seqs-centered -n meme-chip_output/seqs-shuffled
dreme ran successfully in 250.283489 seconds
Starting centrimo: centrimo -seqlen 100 -verbosity 1 -oc meme-chip_output/centrimo_out -bfile meme-chip_output/background meme-chip_output/targets.rna.fa meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
centrimo ran successfully in 1.252656 seconds
Starting tomtom: tomtom -verbosity 1 -oc meme-chip_output/meme_tomtom_out -min-overlap 5 -dist pearson -evalue -thresh 1 -no-ssc -bfile meme-chip_output/background meme-chip_output/meme_out/meme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
tomtom ran successfully in 1.970017 seconds
Starting tomtom: tomtom -verbosity 1 -oc meme-chip_output/dreme_tomtom_out -min-overlap 5 -dist pearson -evalue -thresh 1 -no-ssc -bfile meme-chip_output/background meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
tomtom ran successfully in 0.538962 seconds
Starting tomtom: tomtom -verbosity 1 -text -thresh 0.1 meme-chip_output/combined.meme meme-chip_output/combined.meme 1> meme-chip_output/motif_alignment.txt
tomtom ran successfully in 0.741587 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_1 -bgfile meme-chip_output/background -primary RMAUGU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.249633 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_2 -bgfile meme-chip_output/background -primary UWCWG meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.220103 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_3 -bgfile meme-chip_output/background -primary RNCMPT00187 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.262371 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_4 -bgfile meme-chip_output/background -primary 1 meme-chip_output/targets.rna.fa meme-chip_output/meme_out/meme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.247859 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_5 -bgfile meme-chip_output/background -primary RYAUUU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.219249 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_6 -bgfile meme-chip_output/background -primary RNCMPT00081 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.262293 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_7 -bgfile meme-chip_output/background -primary RNCMPT00183 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.25771 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_8 -bgfile meme-chip_output/background -primary 2 meme-chip_output/targets.rna.fa meme-chip_output/meme_out/meme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.257986 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_9 -bgfile meme-chip_output/background -primary 3 meme-chip_output/targets.rna.fa meme-chip_output/meme_out/meme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.253933 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_10 -bgfile meme-chip_output/background -primary UWUUAAAA meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.219708 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_11 -bgfile meme-chip_output/background -primary RNCMPT00164 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.259883 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_12 -bgfile meme-chip_output/background -primary UGUAHAU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.2182 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_13 -bgfile meme-chip_output/background -primary RGAAR meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.214805 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_14 -bgfile meme-chip_output/background -primary RCUGU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.219522 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_15 -bgfile meme-chip_output/background -primary RNCMPT00177 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.247971 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_16 -bgfile meme-chip_output/background -primary UUYUC meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.225719 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_17 -bgfile meme-chip_output/background -primary RNCMPT00094 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.255083 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_18 -bgfile meme-chip_output/background -primary CGYUGGGA meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.221656 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_19 -bgfile meme-chip_output/background -primary UUCAYUU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.217639 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_20 -bgfile meme-chip_output/background -primary CASAG meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.219912 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_21 -bgfile meme-chip_output/background -primary RNCMPT00184 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.257799 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_22 -bgfile meme-chip_output/background -primary UGGGGAUR meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.213121 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_23 -bgfile meme-chip_output/background -primary RNCMPT00040 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.254637 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_24 -bgfile meme-chip_output/background -primary GUUAUUGY meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.219819 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_25 -bgfile meme-chip_output/background -primary RNCMPT00126 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.25639 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_26 -bgfile meme-chip_output/background -primary YGUGUGUG meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.219818 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_27 -bgfile meme-chip_output/background -primary RNCMPT00123 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.258439 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_28 -bgfile meme-chip_output/background -primary USCUCUGU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.21857 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_29 -bgfile meme-chip_output/background -primary GAAUUYCU meme-chip_output/targets.rna.fa meme-chip_output/dreme_out/dreme.xml meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.231256 seconds
Starting spamo: spamo -verbosity 1 -oc meme-chip_output/spamo_out_30 -bgfile meme-chip_output/background -primary RNCMPT00217 meme-chip_output/targets.rna.fa /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/meme_out/meme.xml meme-chip_output/dreme_out/dreme.xml /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme
spamo ran successfully in 0.256247 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_1 --bgfile meme-chip_output/background --motif RMAUGU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.190989 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_2 --bgfile meme-chip_output/background --motif UWCWG meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.152011 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_3 --bgfile meme-chip_output/background --motif RNCMPT00187 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.213867 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_4 --bgfile meme-chip_output/background --motif 1 meme-chip_output/meme_out/meme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.190569 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_5 --bgfile meme-chip_output/background --motif RYAUUU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.145903 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_6 --bgfile meme-chip_output/background --motif RNCMPT00081 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.190941 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_7 --bgfile meme-chip_output/background --motif RNCMPT00183 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.199312 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_8 --bgfile meme-chip_output/background --motif 2 meme-chip_output/meme_out/meme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.185331 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_9 --bgfile meme-chip_output/background --motif 3 meme-chip_output/meme_out/meme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.186843 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_10 --bgfile meme-chip_output/background --motif UWUUAAAA meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.14202 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_11 --bgfile meme-chip_output/background --motif RNCMPT00164 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.188435 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_12 --bgfile meme-chip_output/background --motif UGUAHAU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.143212 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_13 --bgfile meme-chip_output/background --motif RGAAR meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.144987 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_14 --bgfile meme-chip_output/background --motif RCUGU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.143138 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_15 --bgfile meme-chip_output/background --motif RNCMPT00177 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.19583 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_16 --bgfile meme-chip_output/background --motif UUYUC meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.142276 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_17 --bgfile meme-chip_output/background --motif RNCMPT00094 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.199669 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_18 --bgfile meme-chip_output/background --motif CGYUGGGA meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.15488 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_19 --bgfile meme-chip_output/background --motif UUCAYUU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.145119 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_20 --bgfile meme-chip_output/background --motif CASAG meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.143744 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_21 --bgfile meme-chip_output/background --motif RNCMPT00184 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.201346 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_22 --bgfile meme-chip_output/background --motif UGGGGAUR meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.158891 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_23 --bgfile meme-chip_output/background --motif RNCMPT00040 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.200742 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_24 --bgfile meme-chip_output/background --motif GUUAUUGY meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.155388 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_25 --bgfile meme-chip_output/background --motif RNCMPT00126 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.204396 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_26 --bgfile meme-chip_output/background --motif YGUGUGUG meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.164725 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_27 --bgfile meme-chip_output/background --motif RNCMPT00123 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.202749 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_28 --bgfile meme-chip_output/background --motif USCUCUGU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.155775 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_29 --bgfile meme-chip_output/background --motif GAAUUYCU meme-chip_output/dreme_out/dreme.xml meme-chip_output/targets.rna.fa
fimo ran successfully in 0.155287 seconds
Starting fimo: fimo --parse-genomic-coord --verbosity 1 --oc meme-chip_output/fimo_out_30 --bgfile meme-chip_output/background --motif RNCMPT00217 /home/jovyan/software/motif_databases/RNA/Ray2013_rbp_All_Species.meme meme-chip_output/targets.rna.fa
fimo ran successfully in 0.186288 seconds
Log::Log4perl configuration looks suspicious: No loggers defined at /usr/local/share/perl/5.20.2/Log/Log4perl/Config.pm line 322.
Warning: <motif> element has no '' key attribute at /home/jovyan/software/bin/meme-chip line 1144.
Warning: <motif> element has no '' key attribute at /home/jovyan/software/bin/meme-chip line 1144.
Warning: <motif> element has no '' key attribute at /home/jovyan/software/bin/meme-chip line 1144.
Warning: <motif> element has no '' key attribute at /home/jovyan/software/bin/meme-chip line 1144.

Before defining a motif as useful for identifying targets on a transcriptome wide scale we define a motif as valid if the number of sites it encounters during meme-chip is equal or bigger then 5% of the number of given sites used for the generation of the motif.

In [58]:
print datetime.now()
minSites=len(dfTargetsBED)*0.05
newMOTIFS=age.filterMotifs(outFolder+"meme-chip_output/combined.meme",outFolder+"user.selected.meme",minSites=minSites)
2016-10-29 14:12:11.981090
MOTIF 0  1746
MOTIF 1  3120
MOTIF 2  20
MOTIF 3  30
MOTIF 4  1965
MOTIF 5  20
MOTIF 6  20
MOTIF 7  20
MOTIF 8  3
MOTIF 9  20
MOTIF 10  20
MOTIF 11  20
MOTIF 12  20
MOTIF 13  7
MOTIF 14  20
MOTIF 15  244
MOTIF 16  20
MOTIF 17  20
MOTIF 18  20
MOTIF 19  20
MOTIF 20  20
MOTIF 21  421
MOTIF 22  20
MOTIF 23  1715
MOTIF 24  1311
MOTIF 25  20
MOTIF 26  20
MOTIF 27  1781
MOTIF 28  20
MOTIF 29  20
MOTIF 30  33
MOTIF 31  20
MOTIF 32  20
MOTIF 33  20
MOTIF 34  232
MOTIF 35  360
MOTIF 36  20
MOTIF 37  20
MOTIF 38  20
MOTIF 39  20
MOTIF 40  20
MOTIF 41  20
MOTIF 42  45
MOTIF 43  20
MOTIF 44  20
MOTIF 45  20
MOTIF 46  93
MOTIF 47  20
MOTIF 48  94
MOTIF 49  20
MOTIF 50  20
MOTIF 51  48
MOTIF 52  20
MOTIF 53  36
MOTIF 54  20
MOTIF 55  20
MOTIF 56  20
In [59]:
print datetime.now()
2016-10-29 14:12:12.043453

13. Identifying target transcripts using MEME

We start by extracting the fasta sequences of each transcript using gffread, the annotation.gft file as well as the genome.fa file.

As before, we convert all T to U.

Finally we run fimo to scan the transcriptome for the presence of our selected motifs.

In [60]:
%%bash
cd ~/work/results
gffread -w transcripts.fa -g rsem-results/genome.fa rsem-results/annotation.gtf
sed '/^>/! y/tT/uU/' < transcripts.fa > transcripts.rna.fa
fimo --norc --text user.selected.meme transcripts.rna.fa > fimo.output.tsv
Warning: text mode turns off computation of q-values
Using motif 0 of width 6.
Using motif 1 of width 5.
Using motif 4 of width 6.
Using motif 15 of width 8.
Using motif 21 of width 7.
Using motif 23 of width 5.
Using motif 24 of width 5.
Using motif 27 of width 5.
Using motif 34 of width 7.
Using motif 35 of width 5.
In [61]:
print datetime.now()
dfFimo=pd.read_table(outFolder+"fimo.output.tsv")
FimoTranscripts=list(set(dfFimo["sequence name"].tolist())) 
allt=list(set(dfTargetsAn["transcript_id"].tolist()))

print "Number of fimo detected transcripts:", len(FimoTranscripts)
print "Number of transcripts associated with sequences submited to MEME-ChIP:", len(allt)
print "Number of transcripts in organism:", len(list(set(parsedGTF["transcript_id"].tolist())))
print "Number of transcripts associated with sequences submited to MEME-ChIP\nwhich are also detected by FIMO:", len([ s for s in FimoTranscripts if s in allt ])

FimoGenes=list(set(parsedGTF[parsedGTF["transcript_id"].isin(FimoTranscripts)]["gene_id"].tolist()))
alltGenes=list(set(parsedGTF[parsedGTF["transcript_id"].isin(allt)]["gene_id"].tolist())) 

print "\n"
print "Number of fimo detected genes:", len(FimoGenes )
print "Number of genes associated with sequences submited to MEME-ChIP:", len(alltGenes)
print "Number of genes in organism:", len(list(set(parsedGTF["gene_id"].tolist())))
print "Number of genes associated with sequences submited to MEME-ChIP\nwhich are also detected by FIMO:", len([ s for s in FimoGenes if s in alltGenes ])
2016-10-29 14:17:35.131895
Number of fimo detected transcripts: 60257
Number of transcripts associated with sequences submited to MEME-ChIP: 3436
Number of transcripts in organism: 199349
Number of transcripts associated with sequences submited to MEME-ChIP
which are also detected by FIMO: 2238


Number of fimo detected genes: 22917
Number of genes associated with sequences submited to MEME-ChIP: 1465
Number of genes in organism: 60725
Number of genes associated with sequences submited to MEME-ChIP
which are also detected by FIMO: 1271
In [62]:
print datetime.now()
TargetT_=FimoTranscripts
TargetG_=list(set(parsedGTF[parsedGTF["transcript_id"].isin(TargetT_)]["gene_id"].tolist())) 
2016-10-29 14:17:41.517889

14. Merging MEME identified targets and DGE

As before, we visualise the final result of MEME combined with our differential gene expression results.

In [63]:
print datetime.now()
dfGenesWoutB_,redGenesOutB_=age.MA(dfGenes,'Genes',outFigures+'Figure22',list_of_comparisons,spec=TargetG_,splines=False,sizeRed=2)
print "(B_FIMO) red, RBP target genes"
sys.stdout.flush()

dfTranscritpsWoutB_,redTranscriptsOutB_=age.MA(dfTranscripts,'Transcripts',outFigures+'Figure23',list_of_comparisons,spec=TargetT_, splines=False,sizeRed=2)
print "(B_FIMO) red, RBP target transcripts"
sys.stdout.flush()

TargetG_dif_=[ s for s in TargetG_ if s in sigGenes ]
TargetT_dif_=[ s for s in TargetT_ if s in sigTranscripts ]

dfGenesWoutC_,redGenesOutC_=age.MA(dfGenes,'Genes',outFigures+'Figure24',list_of_comparisons,spec=TargetG_dif_,splines=False,sizeRed=2)
print "(C_FIMO) red, significantly changed RBP target genes"
sys.stdout.flush()

dfTranscritpsWoutC_,redTranscriptsOutC_=age.MA(dfTranscripts,'Transcripts',outFigures+'Figure25',list_of_comparisons,spec=TargetT_dif_,splines=False,sizeRed=2)
print "(C_FIMO) red, significantly changed RBP target transcripts"
sys.stdout.flush()

dfGenesWoutD_,redGenesOutD_=age.MA(dfGenes,'Genes',outFigures+'Figure26',list_of_comparisons,Targets=TargetG_dif_,sizeRed=2)
print "(D_FIMO) red, significantly changed RBP target genes out of the 0.5 percentil"
sys.stdout.flush()

dfTranscritpsWoutD_,redTranscriptsOutD_=age.MA(dfTranscripts,'Transcripts',outFigures+'Figure27',list_of_comparisons,Targets=TargetT_dif_,sizeRed=2)
print "(D_FIMO) red, significantly changed RBP target transcript out of the 0.5 percentil"
sys.stdout.flush()
2016-10-29 14:17:41.953274
(B_FIMO) red, RBP target genes
(B_FIMO) red, RBP target transcripts
(C_FIMO) red, significantly changed RBP target genes
(C_FIMO) red, significantly changed RBP target transcripts
(D_FIMO) red, significantly changed RBP target genes out of the 0.5 percentil
(D_FIMO) red, significantly changed RBP target transcript out of the 0.5 percentil
In [64]:
print datetime.now()
plotKDE(dfGenes,'Genes',outFigures+'Figure28',TargetG_)
plotKDE(dfTranscripts,'Transcripts',outFigures+'Figure29',TargetT_)
2016-10-29 14:18:37.135485

We annotate our table of MEME identified targets, save the respective report tables, and use DAVID for enrichment analysis.

In [65]:
print datetime.now()

dfFimo_=dfFimo.copy()
dfFimo_["transcript_id"]=dfFimo_["sequence name"]
dfFimo_=dfFimo_.drop(["sequence name"],axis=1)
dfFimoAnn=AnnotateDF(dfFimo_,"transcript_id")

FIMO_OUT=outFolder+"FIMO_filtered/"
if not os.path.exists(FIMO_OUT):
    os.makedirs(FIMO_OUT)
    
writeExcel(dfGenesAn,FIMO_OUT+"diff.gene.expression.xlsx",[redGenesOutA,redGenesOutB_,redGenesOutC_,redGenesOutD_,],"gene_id")       
writeExcel(dfTranscriptsAn,FIMO_OUT+"diff.transcrits.expression.xlsx",[redTranscriptsOutA,redTranscriptsOutB_,redTranscriptsOutC_,redTranscriptsOutD_,],"transcript_id")       
writeExcel(dfFimoAnn,FIMO_OUT+"target.genes.xlsx",[redGenesOutA,redGenesOutB_,redGenesOutC_,redGenesOutD_,],"gene_id")       

for f in ["diff.gene.expression.xlsx","diff.transcrits.expression.xlsx","target.genes.xlsx"]:
    res=EnrichmentsFromFile(FIMO_OUT+f,DAVIDuser=DAVIDuser,parsedGTF=fixGTF)
2016-10-29 14:18:38.889488
all
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.gene.expression.xlsx, sheet sig
A
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.gene.expression.xlsx, sheet A
B
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.gene.expression.xlsx, sheet B
C
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.gene.expression.xlsx, sheet C
D
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.gene.expression.xlsx, sheet D
all
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.transcrits.expression.xlsx, sheet sig
A
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.transcrits.expression.xlsx, sheet A
B
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.transcrits.expression.xlsx, sheet B
C
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.transcrits.expression.xlsx, sheet C
D
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/diff.transcrits.expression.xlsx, sheet D
all
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/target.genes.xlsx, sheet all
A
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/target.genes.xlsx, sheet A
B
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/target.genes.xlsx, sheet B
C
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/target.genes.xlsx, sheet C
D
Curating DAVID for file /home/jovyan/work/results/FIMO_filtered/target.genes.xlsx, sheet D

15. Calculating distance between target sites and stop codons

KHSRP has been previously shown to be a key mediator of mRNA decay through the interaction with AU-rich-elements in target mRNAs [ref].

Given this report we expect KHSRP target sites to be located in the 3'UTR of respective target mRNAs. We therefore calculated the distance between each target site and the stop codon of each respective target gene.

We start by creating a dictionary where for each transcript all exon positions are present eg. dic= {"ESNT00001":[1001,1002,1003,...6023,6024,6025], "ESNT00001":[3301,3302,3303,...5042,5043,5044]}. It is important to notice that for transcripts encoded in the "-" strand, the positions of each exon need to be reverse sorted.

Having created this dictionary we can used to identify the position of each genomic coordinate on the transcript.

eg. Using the dictionary above we can see that position 1003 of "ESNT00001" is the 3rd position in the transcript.

Having translated all genomic coordinates of target sites and stop codons to transcript coordinates we plot the distribution of the distances between both.

In [66]:
print datetime.now()
GTFforMAP=parsedGTF[['seqname','feature','start','end','strand','frame','gene_id','transcript_id','exon_id','exon_number']]
GenTransMapDic=age.MAPGenoToTrans(GTFforMAP,"exon")
2016-10-29 16:27:26.024323
/home/jovyan/AGEpy/AGEpy/AGEpy.py:2043: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  GenTransMap["feature_bases"]=GenTransMap.apply(getExonsPositions, axis=1)
In [67]:
print datetime.now()
stop_codons=parsedGTF[parsedGTF["feature"]=="stop_codon"][["start","end","transcript_id","strand"]]
stop_codons["mean"]=stop_codons[["start","end"]].mean(axis=1)
stop_codons=stop_codons[["transcript_id","mean","strand"]]
stop_codons["transcript_stop"]=stop_codons.apply(age.GetTransPosition,args=("mean",GenTransMapDic,),axis=1)
stop_codons=stop_codons[["transcript_id","transcript_stop","strand"]]

dfTargetsTrans=dfTargets[["chromStart","chromEnd","name","transcript_id"]].drop_duplicates().dropna()
dfTargetsTrans["mean"]=dfTargetsTrans[["chromStart","chromEnd"]].mean(axis=1)
dfTargetsTrans["transcript_target"]=dfTargetsTrans.apply(age.GetTransPosition,args=("mean",GenTransMapDic,),axis=1)
dfTargetsTrans=dfTargetsTrans[["transcript_id","transcript_target"]].dropna()
dfTargetsTrans=pd.merge(dfTargetsTrans,stop_codons,on=["transcript_id"])
dfTargetsTrans["dist"]=dfTargetsTrans["transcript_target"]-dfTargetsTrans["transcript_stop"]

dfTargetsFIMO=dfFimo.copy()
dfTargetsFIMO["transcript_target"]=dfTargetsFIMO[["start","stop"]].mean(axis=1)
dfTargetsFIMO=dfTargetsFIMO[["sequence name","transcript_target"]].drop_duplicates()
dfTargetsFIMO.columns=["transcript_id","transcript_target"]
dfTargetsFIMO=pd.merge(dfTargetsFIMO,stop_codons,on=["transcript_id"])
dfTargetsFIMO["dist"]=dfTargetsFIMO["transcript_target"]-dfTargetsFIMO["transcript_stop"] 
2016-10-29 16:30:22.328757
In [68]:
print datetime.now()
def plotDIST(df,title,figName,cum=False,per=0.05,xlim=None):
    """
    Plots the distribution of the distances in a dataframe
    
    :param df: a Pandas dataframe with the column 'dist'
    :param title: plot title
    :param figName: /path/to/saved/figure/prefix
    :param cum: plot the cumulative plot instead 
    :param per: if 'cum=True' choose the percentil to mark
    :param xlim: list with [lower,upper] x limits for the plot
    
    :returns: nothing 
    """

    sns.set_style("white")
    p=sns.kdeplot( df["dist"],cumulative=cum)
    x,y = p.get_lines()[0].get_data()
    
    if not cum:
        y_=list(y).index(max(list(y)))
        plt.vlines(x[y_],0,y[y_],linestyles='dotted',lw=1)
        plt.ylabel("frequency")
        plt.annotate("x=%s" %str(np.around(x[y_],2)) , xy=(x[y_]*1.05, y[y_]*1.05))#xycoords='figure points')
        plt.ylim(0,y[y_]*1.1)
    else:
        y_max=len([s for s in y if s < (1.00-per)])
        y_min=len([s for s in y if s < per])
        x_max=x[y_max-1]
        x_min=x[y_min-1]
        plt.vlines(x_min,0,y[y_min-1],linestyles='dotted',lw=1)
        plt.vlines(x_max,0,y[y_max-1],linestyles='dotted',lw=1)
        plt.ylabel("cumulative")
        
    plt.gca().spines['right'].set_visible(False)
    plt.gca().spines['top'].set_visible(False)
    plt.xlabel("dist. to stop codon (bases)")

    plt.title(title)
    plt.xticks(rotation=45)
    plt.gca().legend().set_visible(False)
    if xlim:
        plt.xlim(xlim[0],xlim[1])
    plt.savefig(figName+".png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
    plt.savefig(figName+".svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')
    plt.show()
    if cum:
        return x_min, x_max, (x,y)
    else:
        return x,y

x_exp,y_exp=plotDIST(dfTargetsTrans,"all eCLIP targtets",outFigures+'Figure30')
x_cha,y_cha=plotDIST(dfTargetsTrans[dfTargetsTrans["transcript_id"].isin(redTranscriptsOutA)],"changed eCLIP targtets",outFigures+'Figure31')
x_FIMO,y_FIMO=plotDIST(dfTargetsFIMO,"all FIMO targets",outFigures+'Figure32')
x_FIMO_cha,y_FIMO_cha=plotDIST(dfTargetsFIMO[dfTargetsFIMO["transcript_id"].isin(sigTranscripts)],"changed FIMO targets",outFigures+'Figure33')
2016-10-29 16:30:45.014803

16. Filtering terms of interest from enrichment tables

We have previously shown that Khsrp is regulated in response to genotoxic stress. We therefore searched for any "Biological process" enriched term in all DAVID "Biological process" outputs which contains at least one of the strings "cell cycle","apoptosis","cell death","cell division", or "proliferation".

In [69]:
print datetime.now()
def EnrPlots(df,title,figName,stringsOFinterest=["cell cycle","apoptosis","cell death","cell division","proliferation"]):
    df_=df.copy()
    df_["-log10(p)"]=df_["ease"].apply( lambda x: -1*np.log10(x) )
    dfA=df_[:20]
    all_terms=df_["termName"].tolist()
    filteredTerms=[]
    for s in all_terms:
        tt=[ i for i in stringsOFinterest if i in s ]
        if len(tt)>0:
            filteredTerms.append(s)
    dfB=df_[df_["termName"].isin(filteredTerms)]
    dfB=dfB[:20]
    
    fig = plt.figure(figsize=(20,20))
    for i,dd in zip([1,2],[dfA,dfB]):
        ax1 = fig.add_subplot(1,2,i)
        arr=np.arange(len(dd))+.5

        ax1.barh(arr, dd["-log10(p)"].tolist(), color='black', edgecolor='black')#range(0,len(test))

        ax1.tick_params(
            axis='y',
            which='both',
            left='off',
            right='off',
            labelleft='on')

        ax1.tick_params(
            axis='x',
            which='both',
            bottom='on',
            top='off',
            labelbottom='on',
            labeltop='off')

        ax1.set_ylim(ymax = max(arr) + 1.5 ) #1.5
        ax1.set_xlabel("-log10(p)")
        ax1.xaxis.set_label_position('bottom')

        ax1.spines['right'].set_visible(False)
        ax1.spines['top'].set_visible(False)
        
        labels=[]
        for l in dd["termName"].tolist():
            if len(l)>35:
                res=l[:len(l)/2]+"\n"+l[len(l)/2:]
            else:
                res=l
            labels.append(res)
        
        ax1.set_yticks(arr+0.4)
        ax1.set_yticklabels(labels)
        
    fig.suptitle(title, fontsize=20)
    plt.savefig(figName+".png",dpi=300,bbox_inches='tight', pad_inches=0.1,format='png')
    plt.savefig(figName+".svg",dpi=300,bbox_inches='tight', pad_inches=0.1,format='svg')
    plt.show()


DAVIDfiles=os.listdir(outFolder)
DAVIDfiles=[ s for s in DAVIDfiles if "DAVID" in s ]

DAVIDfiles_=os.listdir(FIMO_OUT)
DAVIDfiles_=[ s for s in DAVIDfiles_ if "DAVID" in s ]

for f in DAVIDfiles:
    try:
        df=pd.read_excel(outFolder+f,"GOTERM_BP_FAT")
        EnrPlots(df,f.split(".xlsx")[0],outFigures+f.split(".xlsx")[0],)
    except:
        print f

for f in DAVIDfiles_:
    try:
        df=pd.read_excel(FIMO_OUT+f,"GOTERM_BP_FAT")
        EnrPlots(df,f.split(".xlsx")[0],FIMO_OUT+f.split(".xlsx")[0])
    except:
        print f
2016-10-29 16:30:47.948926
diff.gene.expression.DAVID.D.xlsx
diff.transcrits.expression.DAVID.D.xlsx
diff.transcrits.expression.DAVID.E.xlsx
target.genes.DAVID.A.xlsx
target.genes.DAVID.D.xlsx
In [70]:
print datetime.now()
2016-10-29 16:33:34.440869